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ME THOD S AND COMPOSITIONS FOR 
IDENTIFYING OSTEOGENIC AGENTS 

Technical Field 

The present invention relates to assay techniques for identifying agents which 
modulate bone growth. 

Background of the Invention 

Although there is a great deal of information available on the factors which 
influence the breakdown and resorption of bone, information on growth factors which 
stimulate the formation on growth fectors which stimulate the formation of new bone is 
more limited. Investigators have searched for sources of such activities and have found 
that bone tissue itself is a storehouse for fectors which have the capacity for stimulating 
bone cells. Thus, extracts of bovine tissue obtained from slaughterhouses contain not only 
structural proteins which are responsible for maintaining the structural integrity of bone, 
but also biologically active bone growth factors which can stimulate bone cells to 
proliferate. Among these latter factors are transforming growth factor P, the heparin- 
binding growth factors (acidic and basic fibroblast growth factor), the insulin-like growth 
factors (insulin-like growth factor I and insulin-like growth factor II) and a recently 
described family of proteins called bone morphogenetic proteins (BMPs). All of these 
growth fectors have effects on other types of cells as well as on bone cells. 

The BMPs are novel factors in the extended transforming growth factor 0 family. 
They were first identified in extracts of demineralized bone (Urist 1965, Wozney et a/., 
1 988). Recombinant BMP-2 and BMP-4 can induce new bone formation when they are 
injected locally into the subcutaneous tissues of rats (Wozney 1 992, Wozney & Rosen 
1993). These factors are expressed by normal osteoblasts as they differentiate, and have 
been shown to stimulate osteoblast differentiation and bone nodule formation in vitro as 
well as bone formation in vivo (Harris et al. 9 1994). This latter property suggests potential 
usefulness as therapeutic agents in diseases which result in bone loss. 

The cells which are responsible for forming bone are osteoblasts. As osteoblasts 
differentiate from precursors to mature bone-forming cells, they express and secrete a 
number of the structural proteins of the bone matrix including Type-1 collagen, osteocalcin, 
osteopontin and alkaline phosphates (Stein et aL, 1990, Harris et aL f 1994). They also 
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synthesize a number of growth regulatory peptides which are stored in the bone matrix and 
are presumably responsible for normal bone formation. These growth regulatory peptides 
include the BMPs (Harris etal, 1994). In studies of primary cultures of fetal rat calvarial 
osteoblasts, BMPs 1, 2, 3, 4, and 6 are expressed by cultured cells prior to the formation of 
5 mineralized bone nodules (Harris et al % 1994). Expression of the BMPs coincides with 
expression of alkaline phosphatase, osteocalcin and osteopontin. 

Although the BMPs have powerful effects to stimulate bone formation in vitro and 
in vivo, there are disadvantages to their use as therapeutic agents to enhance bone healing. 
Receptors for the bone morphogenetic proteins have been identified in many tissues, and 
10 the BMPs themselves are expressed in a large variety of tissues in specific temporal and 
spatial patterns. This suggests that they may have effects on many tissues other than bone, 
potentially limiting their usefulness a therapeutic agents when administered systematically. 
Moreover, since they are peptides, they would have to be administered by injection. These 
disadvantages are severe limitations to the development of BMPs as therapeutic agents. 

15 It is an object of the present invention to overcome the limitations inherent in 

known osteogenic agents by providing a method to identify potential drugs which would " 
stimulate production of BMPs locally in bone. 

Prior Art 

Sequence data on small fragments of the S'-flanking region of the BMP-4 gene have 
20 been published (Chen et al, 1 993; Kurihara et al 9 1993), but the promoter has not been 
previously functionally identified or isolated. 

Disclosure of the Invention 

A cell-based assay technique for identifying and evaluating compounds which 
stimulate the growth of bone is provided, comprising culturing a host cell line comprising 

25 an expression vector comprising a DNA sequence encoding a promoter region of at least 
one bone morphogenetic protein, operatively linked to a reporter gene encoding an 
assayable product under conditions which permit expression of said assayable product, 
contacting the cultured cell line with at least one compound suspected of possessing 
osteogenic activity, and identifying osteogenic agents by their ability to modulate the 

30 expression of the reporter gene and thereby increase the production of the assayable 
product. 
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This assay technique specifically identifies osteogenic agents which stimulate bone 
cells to produce bone growth factors in the bone morphogenetic protein family. These 
osteogenic agents display the capacity to increase the activity of the promoters of genes of 
members of the BMP family and other bone growth factors normally produced by e.g. bone 
5 cells. 

Also provided in accordance with the present invention are isolated DNA sequences 
encoding a promoter region of at least one bone morphogenetic protein, and a system for 
identifying osteogenic agents comprising an expression vector comprising such promoter 
sequences operatively linked to a reporter gene encoding an assayable product, and means 
10 for detecting the assayable product produced a response to exposure to an osteogenic 
compound. 

Brief Description of the Drawings 

Figure 1 A graphically depicts a restriction enzyme map of mouse genomic BMP-4 
and a diagram of two transcripts. The mouse BMP-4 gene transcription unit is -7kb and 

15 contains 2 coding exons (closed boxes) and 3 non-encoding exons, labeled exons 1 A, IB 
and 2. This 19kb clone has an -6kb 5 ' -flanking region and an -7kb 3 ' -flanking region. 
The diagram shows approximately 2.4kb of the 5' -flanking region, and a small region of 
the 3' -flanking region. The lower panel shows two alternative transcripts of BMP-4. 
Both have the same exons 2, 3 and 4 but a different exon 1. Transcript A has exon 1 A and 

20 transcript B has exon IB whose size was estimated according to RT-PCR and primer 
extension analysis in FRC cells; 

Figure IB depicts the DNA sequence of selected portions of mouse genomic BMP- 
4 (SEQ. ID NO. 1) and the predicted amino acid sequences of the identified coding exons 
(SEQ. ID NO. 2). The numbers on the right show the position of the nucleotide sequence 
25 and the bold numbers indicate the location of the amino acid sequence of the coding region. 
Most of the coding sequence is in exon 4. The end of the transcription unit was estimated 
based on a 1.8kb transcript. Primer 1 in exon 1A was used in RT-PCR analysis with Primer 
3 in exon 3. Primer 2 in exon IB was used in RT-PCR analysis with Primer 3. Primer Bl 
and B2 were used in primer extension reactions; 

30 Figure 1C portrays the sequence of the BMP-4 exon 1A 5'-flanking region and 

potential response elements in the mouse BMP-4 1 A promoter (SEQ. ID NO. 3). The 
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sequences of 2688 bp of the mouse BMP-4 gene are shown. Nucleotides are numbered on 
the left with +1 corresponding to the major transcription start site of the 1 A promoter. The 
response elements of DR-1 A Proximal and DR-1 A Distal oligonucleotides are indicated. 
The other potential response DNA elements in the boxes are p53, RB (retinoblastoma), SP- 
5 1 , AP-1, and AP-2. Primer A, indicated by the line above the DNA sequence at +1 14 to 
+96, was used for primer extension analysis of exon lA-containing transcripts; 

Figure 2 depicts the results of a primer extension assay. Total RNAs prepared from 
FRC cells (on the left frame) and mouse embryo 9.5 days (on the right) were used with 
primer A or the complement of primer 2. Two major extended fragments, 67 and 1 15 bp, 
1 0 indicated a lane A were obtained from primer A Two 1 B primers, primer B 1 and primer 
B2, also gave negative results with both FRC and mouse embryo total RNA as template. 
Transcript B is not detectable with this assay. By RT-PCR, transcript B -can be detected 
and quantified; 

Figure 3A is a photographic representation of gel electrophoresis of 1 A-3 and 1B-3 
1 5 RT-PCR products of the BMP-4 gene. RT-PCR was performed with two pairs of primers 
using FRC cell poly A* mRNA as the template. The products were verified by the DNA ' 
sequence; 

Figure 3B is a schematic diagram of spliced BMP-4 RT-PCR products with 1 A and 
IB exons in FRC cells. RT-PCR was performed with two pairs of primers using FRC cell 
20 poly A* mRNA as the template. The diagram shows where the primers are located in the 
BMP-4 genomic DNA. RT-PCR product 1 A-2-3 which contains exon 1 A, exon 2 and the 
5' region of exon 3, was produced with primer 1 and primer 3. Primer 2 and primer 3 
generated two RT-PCR products with the exon 1B-2-3 pattern. The heterogeneity in size 
of exon IB is indicated. The 1 A promoter is predominantly utilized in bone cells; 

25 Figure 4A provides a map of the BMP-4 1 A 5 ' -flanking-CAT plasmid and 

promoter activity in FRC cells. The 2.6kb EcoRl and Xba fragment, 1.3 kbPst fragment, 
0.5kb £phl and Pst fragment, and 0.25kb PCR fragment were inserted into pBLCAT3. 
The dosed box indicates the non-coding exon 1 A. The CAT box represents the CAT 
reporter gene. The values represent percentages of CAT activity expressed by pCAT-2.6 

30 set at 100%. The values represent the average of four independent assays; 
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Figure 4B provides an autoradiograra of CAT assays using FRC cells transfected 
with BMP-4 1 A 5 '-flanking-CAT plasmids identified in Figure 4A; 

Figure 5 portrays the nucleotid sequence of the mouse BMP-2 gene 5' -flanking 
region from -2736 to +139 (SEQ. ID NO. 4). The transcription start site is denoted by +1; 

5 Figure 6 A depicts an autoradiogram showing products of a primer extension assay 

for determination of the transcription start site of the BMP2 gene, separated on a 8% 
denaturing urea-polyacrylamide gel, in which Lane 1 : Total RNA from fetal rat calvarial 
osteoblast cells, and Lane 2: Control lane with 10ng of yeast tRNA All RNA samples 
were primed with a 32 p-hibeled oligonucleotide from exon 1 to the mouser BMP2 gene, as 
10 indicated in Figure 6B. Lane M: 32 p-labeled Mspl digested X phage DNA, containing 
DNA fragments spanning from 623 bp to 15 bp (size marker); 

Figure 6B provides a schematic representation of the primer extension assay. The 
primer used is a 18mer synthetic oligonucleotide, 5 '-CCCGGCAAGTTCAAGAAG-3 ' 
(SEQ. ID NO. 5); 

1 5 Figure 7 provides a diagram of selected BMP-2 promoter - luciferase reporter 

constructs. BMP-2 5 * -flanking sequences are designated by hatched boxes (□) and 
luciferase cDNA is designated by the filled box (I). Base +1 14 denotes the 3 ' end of the 
BMP-2 gene in all the constructs; 

. Figure 8 displays the luciferase enzyme activity for the BMP-2 gene-LUC 
20 constructs (shown in Figure 7) transfected in primary fetal rat calvarial osteoblasts (A), 
HeLa cells (B) and ROS 17/2.8 osteoblasts (C). The luciferase activity has been 
normalized to p-galactosidase activity in the cell lysates; 

Figure 9A-F depicts the DNA sequence of the mouse BMP-2 promoter and gene 
(SEQ. ID NO. 6); and 

25 Figure 10A-D depicts the DNA sequence of the mouse BMP-4 promoter and gene 

(SEQ.ro NO. 7). 

Figure 1 1 depicts the resequencing of the BMP-2 5' flanking region. 
Detailed Description of the Preferred Embfidimfi flts 



SUBSTITUTE SHEET (RULE 26} 



WO 9O38590 PCT/US96/08197 

-6- 

A cell-based assay technique for identifying and evaluating compounds which 
stimulate the growth of bone is provided, comprising culturing a host cell line comprising 
an expression vector comprising a DNA sequence encoding a promoter region of at least 
one bone raorphogenetic protein operatively linked to a reporter gene encoding an 
5 assayable product under conditions which permit expression of said assayable product, 
contacting the cultured cell line with at least one compound suspected of possessing 
osteogenic activity, and identifying osteogenic agents by their ability to modulate the 
expression of the reporter gene and thereby increase the production of the assayable 
product. 

1 0 The present invention is distinguished from other techniques for identifying bone- 

active compounds, as it specifically identifies chemical compounds, agents, factors or other 
substances which stimulate bone cells to produce the bone growth factors in the bone 
raorphogenetic protein (BMP) family (hereinafter "osteogenic agents"). These osteogenic 
agents are identified by their capacity to increase the activity of the promoters of genes of 

1 5 members of the BMP family and other bone growth factors which are normally produced 
by bone cells, and other cells including cartilage cells, tumor cells and prostatic cells. When 
patients are treated with such chemical compounds, the relevant BMP will be produced by 
bone cells and then be available locally in bone to enhance bone growth or bone healing. 
Such compounds identified by this assay technique will be used for the treatment of 

20 osteoporosis, segmental bone defects, fracture repair, prosthesis fixation or any disease 
associated with bone loss. 

Compounds that inhibit bone morphogenetic protein expression in bone or cartilage 
may also be useful in clinical situations of excess bone formation which occurs in such 
diseases as osteoblastic metastases or osteosclerosis of any cause. Such compounds can 
25 also be identified in accordance with the present invention. 

Abo provided in accordance with the present invention are isolated DNA sequences 
encoding a promoter region of at least one bone morphogenetic protein, and a system for 
identifying osteogenic agents comprising an expression vector comprising such promoter 
sequences operatively linked to a reporter gene encoding an assayable product, and means 
30 for detecting the assayable product produced in response to exposure to an osteogenic 
compound. 
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The promoters of the genes for BMP -4 and BMP-2 are complex promoters which 
can be linked to reporter genes, such as e.g. the firefly luciferase gene. When the hybrid 
genes (for example, bone cell BMP-4 promoter or bone cell BMP-2 promoter and firefly 
luciferases, chloramphenicol acetyl transferase (CAT) cDNAs, or cDNA's for other 
5 reporter genes such as p-galactosidase, green fluorescent protein, human growth hormone, 
alkaline phosphatase, ^-glucuronidase, and the like) are transfected into bone cells, 
osteogenic agents which activate the BMP-4 or BMP-2 promoters can be identified by their 
capacity in vitro to increase luciferase activity in cell lysates after cell culture with the 
agent. 

10 Sequence data on small fragments of the 5 '-flanking region of the BMP-4 gene have 

been published (Chen et al t 1993; Kurihara et al y 1993), but the promoter has not been 
previously identified or isolated, and methods for regulating transcription have not been 
shown. The present invention isolates the promoters for the BMP genes and utilizes these 
promoters in cultured bone cells so that agents could be identified which specifically 

1 5 increase BMP-2 or BMP-4 production locally in bone. Since h is known that the BMPs are 
produced by bone cells, a method for enhancing their production specifically in bone should 
avoid systemic toxicity. This benefit is obtained by utilizing the unique tissue specific 
promoters for the BMPs which are provided herein, and then using these gene promoters to 
identify agents which enhance their activity in bone cells. 

20 By utilizing the disclosure provided herein, other promoters can be obtained from 

additional bone morphogenetic proteins such as BMP-3, BMP-5, BMP-6, and BMP-7, to 
provide comparable benefits to the promoters herein specifically described. 

In addition, the present invention contemplates the use of promoters from additional 
growth factors in osteoblastic cells. Included are additional bone morphogenetic proteins, 
25 as well as fibroblast growth factors (e.g. FGF- 1 , FGF-2, and FGF-7), transforming growth 
factors P-l, 0-2. and (5-3, insulin-like growth factor- 1, insulin-like growth factor-2, 
platelet-derived growth fiictor, and the like. Such promoters will readily be utilized in the 
present invention to provide comparable benefits. 

The cells which can be utilized in the present invention include primary cultures of 
30 fetal rat calvarial osteoblasts, established bone cell lines available commercially (MC3T3-E1 
cells, MG-63 cells, U20S cells, UMR106 cells, ROS 17/2.8 cells, SaOS2 cells, and the like 
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as provided in the catalog from the American Type Culture Collection (ATCC)), and bone 
cell lines established from transgenic mice, as well as other cell lines capable of serving as 
hosts for the present vectors and systems. In addition, a number of tumor cell lines also 
express BMPs, including the.prostate cancer cell lines PC3, LNCAP, and DUI145, as well 
as the human cancer cell line HeLa. Thus, any of a number of cell lines will find use in the 
present invention and the choice of an appropriate cell line will be a matter of choice for a 
particular embodiment. 

The following examples serve to illustrate certain preferred embodiments and 
aspects of the present invention and are not to be construed as limiting the scope thereof 



EXPERIMENTAL 

In the experimental disclosure which follows, the following abbreviations apply: e 
(equivalents); M (Molar); mM (millimolar); fiM (micromolar); N (Normal); mol (moles); 
mraol (millimoles); nmol (micromoles); nmol (nanomoles); kg (kilograms); gra (grams); m 
(milligrams); ng (micrograms); ng (nanograms); L (liters); ml (milliliters); fjj (microliters); 
vol (volumes); and *C (degrees Centigrade). 



Example): DESCRIPTION AND CHARACTERIZATION OF MURINE 
BMP-4 GENE PROMOTER 

(a) Library Screening, Cloning and Sequencing of Gene 

A mouse genomic lambda fix II spleen library (Stratagene, La Jolla, CA) was 

screened with a mouse embryo BMP-4 cDNA kindly provided by Dr. B.L.M. Hogan 

(Vanderbilt University School of Medicine, Nashville, TN). The probe was labeled with 

[a- 32 P]dCTP using a random-primer labeling kit from Boehringer-Mannheim (Indianapolis, 

IN). Plaque lift filters were hybridized overnight in 6X SSC, 5X Denhardt's. 0.5% SDS 

containing 200jig/ml sonicated salmon sperm DNA, lOjig/ml Poly A and lOjig/ml t-RNA at 

68° C. The filters were washed at 55° C for 20 min, twice in 2X SSC, 0. 1% SDS buffer, 

once in 0.5X SSC, 0. 1% SDS. The isolated phage DNA clones were analyzed according to 

standard procedures (Sambrook et a/., 1989). 

Fragments from positive clones were subcloned into pBluescrpt vectors 
(Stratagene, La Jolla, CA) and sequenced in both directions using the Sequenase 
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dideoxynucleotide chain termination sequencing kit (U.S. Biochemical Corp., Cleveland, 
OH). 

Three clones were isolated from 2x10* plaques of mouse spleen 129 genomic library 
using full length coding region mouse embryo BMP-4 cDNA probe (B. Hogari, Vanderbilt 
5 University, Nashville, TN). One 19kb clone contained 5 exons and ~6kb 5 '-flanking region 
and a *7kb 3 ' -flanking region, as shown in Figure 1 A. The 7kb transcription unit and the 
5 '-flanking region of the mouse BMP-4 gene were sequenced (Figure 10). 

The nucleotide sequence of selected portions of mouse BMP-4 and the deduced 
amino acid sequence of the coding exons (408 residues; SEQ. ID NO. 2) is shown in Figure 
10 IB. Primers used in the RT-PCR experiments described below are indicated in this Figure. 

Figure 1C shows the DNA sequence of 2372bp of the 5' -flanking region and the 
candidate DNA response elements upstream of exon 1 A. Primers used in primer extensions 
are also shown in Figures IB and 1C. 

(b) Primer Extension Mapping of the Transcriptional Start-Site of the Mouse BMP-4 
15 Gene 

The transcriptional start-sites were mapped by primer extension using the synthetic 
oligonucleotide primer A 5'-CGGATGCCGAACTCACCTA-3 ' (SEQ. ID NO. 8), 
corresponding to the complement of nucleotides +1 14 to +96 in the exon 1A sequence and 
the oligonucleotide primer Bl 5 '-CTAC AAACCCGAGAAC AG-3 ' (SEQ. ID NO. 9), 

20 corresponding to the complement of nucleotides +30 to +13 of the exon IB sequence. 
Total RNA from fetal rat calvarial (FRC) cells and 9.5 day mouse embryo (gift of B. 
Hogan, Vanderbilt University) was used with both primers. The primer extension assay 
was carried out using the primer extension kit from Promega (Madison, WI). The 
annealing reactions were, however, carried out at 60*C in a water bath for 1 hr. The 

25 products were then electrophoresed on 8% denaturing-urea polyacrylamide gels and 
autoradiographed. 

One additional oligonucleotide primer B2 5' -CCCGGCACGAAAGGAGAC-3 ' 
(SEQ. ED NO. 10), corresponding to the complement of nucleotide sequence +69 to +52 of 
exon IB, was also utilized in primer extension reactions with FRC and mouse embryo 
30 RNAs. 
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1 . Evidence for utilization of two alternate exon 1 sequences for the BMP-4 gene. 

Several BMP-4 cDNAs were sequenced from prostate cancer cell in PC-3 and from 
primary PRC cells. Four independent FRC cell BMP-4 cDNAs all contained exon 1A. 
However, the human prostate carcinoma cell line (PC-3) cDNA contained an apparently 
unique exon IB sequence spliced to exon 2 (Chem et al, 1993). A doubt-stranded 
oligonucleotide roble (70bp) to exon IB was synthesized based on the human PC-3 exon 
IB sequence. This exon IB probe was then used to identify the exon IB region in the 
mouse genomic BMP-4 clone. The candidate exon IB is 1696bp downstream from the 3 ' 
end of exon 1 A. 

2. Primer extension analysis 

Primer extension analysis was performed to map the mouse BMP-4 gene 
transcription start shes. Primer A, an oligonucleotide from exon 1 A, was used and two 
oligonucleotides from exon IB. Total RNA was utilized both from mouse embryo and 
FRC cells. As shown in Figure 2, a major extended fragment from primer A was obtained 
in both mouse embryo and FRC cell total RNAs, which migrates at 1 15bp. The extended 
5 '-end of the 1 1 5bp fragment represents the major transcription start site for 1 A-containing 
transcripts. The she of this 5' non-coding exon 1 A is 306bp. A major extended fragment 
from the complement of primer Bl (exon IB) was not detected using both mouse embryo 
and FRC cell total RNAs. One other primer from exon IB also gave negative results, 
suggesting that in 9.5 day mouse embryo and FRC cells, the exon IB-containing transcripts 
were not detectable, which suggests that transcripts containing exon IB are less abundant 
in these cells and tissues than transcripts containing exon 1 A All primer extensions were 
carried out after annealing of primers at high stringency. Lower stringency annealing with 
IB primers gave extended products not associated with BMP-4 mRNA 

(c) BMP-4 Gene 5' Flanking Region for Exon 1A and IB Transcripts. 

Four FRC BMP-4 cDNA were sequenced and found to contain exon 1 A sequences 
spliced to exon 2. The human U20S BMP-4 cDNA sequence also contains exon 1 A 
(Wozney et al t 1988). This suggests the BMP-4 gene sequences upstream or exon 1 A are 
used primarily in bone cells. 

To test whether the BMP-4 IB promoter is u tilized at all in FRC cells, 
oligonucleotide primers were designed to ascertain whether spliced 1B-2-3 exon products 
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and 1A-2-3 exon (control) products could be obtained by more sensitive RT-PCR 
technique using FRC poly (A>RNA. The 3* primer was in exon 3 (Figure IB - Primer 3) 
and the 5* primers were either in exon 1 A (primer 1) or exon IB (primer 2). 

The RT-PCR products were cloned and sequenced. A photograph and diagram of 
the products obtained are presented in Figure 3A and B. Both 1 A-2-3 and 1B-2-3 
products were obtained. The results indicate FRC osteoblasts produce transcripts with 
either 1 A exon or a IB exon, but not both. This suggests that the intron region between 
1 A and IB exons could contain regulatory response elements under certain conditions. Of 
the 1B-2-3 RT-PCR products obtained from FRC osteoblasts, two products were obtained 
with different 3 ' splice sites for the exon IB. By comparison with the genomic DNA, both 
3 ' ends of the two exon IBs have reasonable 5' splice consensus sequences, consistent with 
an alternate splicing pattern obtained for the 1B-2-3 RT-FCR products. Most importantly, 
no 1A-1B-2-3 RT-PCR splice products of the BMP-4 gene were obtained. Thus, IB does 
not appear to be alternatively spliced 5 '-non-encoding exon. By quantitative RT-PCR, it 
was shown that 1A transcripts are 10 to 15X more abundant in primary bone cells. 

The technique of performing RT-PCR will be described. First-strand cDNA was ' 
synthesized from lug FRC cell poly (A>RNA with an 18mer dT primer using 
Superscript™ reverse transcriptase (Gibco BRL) in a total volume of 20jil. The cDNA 
was then used as a template for PGR with two sets of synthesized primers. As shown in 
Figure IB, primer 1 (5 '-GAAGGCAAGAGCGCGAGG-3) (SEQ. ID No. 11), 
corresponding to a 3 ' region of exon 1 A and primer 3 ( 5'-CCGGTCTCAGGTATCA-3*) 
(SEQ. ID No. 12), corresponding to a 5' region of exon 3 were used to generate exon 1 A- 
2-3 spliced PCR product. Primer 2 (5 '-C AGGCGGAAAGCTGTTC-3 ') (SEQ. ID NO. 
13), corresponding to a 3 ' region (+2 to +18) of exon IB, and primer 3 were used to 
generate exon 1B-2-3 spliced PCR products. GeneAmp PCR kit was used according to the 
manufacturer's procedure (Peririn-Elmer/Cetus, Norwalk, CT). Each cycle consisted of a 
denaturation step (94°C for 1 min), an annealing step (59°C for 2 min) and an elongation 
step (72°C for 1 min). The PCR products were analysed by agarose gel electrophoresis for 
size determination. The products were subcloned into pCR II vector using TA cloning kit 
(InVitrogen, San Diego, CA). The inserts were sequenced in both directions with a 
sequencing kit from U.S. Biochemical (Cleveland, OH). 
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Northern analysis demonstrated that the single 1.8kb BMP-4 transcript detected in 
FRC cells during bone cell differentiation hybridizes to both a pure 1 A exon probe and a 2- 
4 exons probe. The ratio of the 1 A to 2-4 signal is constant through the changing levels of 
BMP-4 expression during differentiation. Using a IB exon probe no detectable 
5 hybridization to the BMP-4 exon 2-4 1 .8kb signal was observed. This again indicates that 
1 A containing transcripts predominate in bone cells, although IB transcripts can be 
detected by the more sensitive PCR method. By quantitative PCR it was shown that 1 A 
transcripts are 10-15X more abundant than IB in FRC cells. 

(d) BMP-4 Promoter 1 A Plasmid Construction and Transfection, and Detection of 
1 0 Promoter Activity in Osteoblasts. 

Three BMP-4 1 A promoter/plasmids were constructed by excising fragments from 
the 5* flanking region of the mouse BMP-4 gene and cloning into pBL3CAT expression 
vectors (Luckow and Schutz, 1987). The pCAT-2.6 plasmid was the pBLCAT3 vector 
with a 2.6kb EcoRl and Xba I fragment (-2372/+2S8) of the BMP-4 gene. The pCAT-1.3 

15 plasmid was similarly generated from a 1.3kb Pst fragment (-1 144/+212). The pCAT-0.5 
plasmid was made from a 0.5kbSphI and Pst fragment (-260/+212). Both the pC AT- 1.3 - 
and the pCAT-0.5 plasmids have 212bp of exon 1 A non-coding region. An additional 
promoter/plasmid was created from a PCR amplified product, corresponding to the 240bp 
sequence between nucleotides -25 and +212, and referred to as the pCAT-0.24. The 

20 amplified fragment was first cloned into pCR II vector using TA cloning kit (InVitrogen, 
San Diego, CA) and then the fragment was released with Hind m and Xho I, and relegated 
, into pBL3 CAT. Correct orientation of all inserts with respect to the CAT vector was 
verified by DNA sequencing. 

The cells used for transient transfection studies were isolated from 19 day-old fetal 
25 rat calvariae by sequential digestion with trypsin and collagenase, as described by Bellows 
et al, (1986) and Harris et at, (1994). In brie£ the calvarial bone were surgically removed 
and cleaned by washing in a minimal essential media (aMEM) containing 10% V/V fetal 
calf serum (FCS) and antibiotics. The bones were minced with scissors and were 
transferred to 35mm tissue culture dish containing 5 ml of sterile bacterial collagenase 
30 (0. 1%) and trypsin 1 (0.05%). This was then incubated at 37°C for 20 min. The cells 

released at this time were collected and immediately mixed with an equal volume of FCS to 
inactivate trypsin. This procedure is repeated 6 times to release cells at 20 min intervals. 
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Cells released from 3rd, 4th, 5th and 6th digestion (enriched for steoblasts) were 
combined and the cells are coDected by centrifiigation at 40 Xg for 5 min. The cells were 
then plated in aMEM containing 10% FCS and antibiotics and were grown to confluence 
(2-3 days). At this stage the cells were plated for transfection in 60ram tissue culture 
5 dishes at a cell density of 5 x 10 3 cells per dish. These primary osteoblast cultures are 
capable of self-organizing into bone-like structure in prolonged cultures (Bellows et a], 
1986; Harris et al, 1994). HeLa, ROS 17/2.8, and CV-1 cells were purchased from the 
ATCC. 

The isolated FRC cells, enriched for the osteoblast phenotype, were used as 
recipient cells for transient transfection assays. BMP-4 mRNA is modulated in these cells 
in a transient fashion during prolonged cultured (Hams et al, 1994b). The technique of 
electroporation was used for DNA transfection (Potter, 1988; van den HoRet al, 1992). 
After electroporation, the cells were divided into aliquots, replated in 100mm diameter 
culture dishes and cultured for 48 hours in modified Eagle's minimal essential media 
(MEM, GIBCO, Grand Island, NY) with 10% fetal calf serum (FCS). The extracts were 
assayed for CAT actively according to the method described by Gorman (1988) and CAT . 
activity was normalized by 0-galactosidase assay according to the method of Rouet et al 
(1992). 

After 48 hrs of transfections with various BMP-4-CAT reporter gene plasmid 
constructs, the cells were harvested and the CAT activity was determined. As indicated in 
Figure 4A and 4B, pCAT-0.24 plasmid (-25/+212) has little CAT activity. This plasmid 
contains -25 to +212 of the 5' non-coding exon 1 A and was 3-fold lower that the parent 
pBL3CAT plasmid. The pCAT-0.5 (-260/+212), pCAT-1,3 (-1 144/+212), and pCAT-2.6 
(-2372/+2S8) showed progressive increasing CAT activity when transacted into FRC cells. 
These data are shown in Figure 4B. With pCAT-0.5 (-260/+212) there is a 10-fold 
increase in CAT activity relative to pCAT-0.24 (-25/+212). pCAT-1.3 (-1 144/+212) 
shows a further 6-fold increase and pCAT-2.6 (-2372/+258) shows further 2-fold change 
over pCAT-1 .3 (-1 144/+212). Thus the net increase in CAT activity between the pCAT- 
0.24 (+257/+212) and the pCAT-2.6 (-2372/+25S) in FRC cells is approximately 100-fold. 

Example 2 : DESCRIPTION AND CHARACTERIZATION OF 
MURINE BMP-2 GENE PROMOTER 
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(a) Cloning of Mouse BMP-2 Genomic DNA. 

Genomic clones of the mouse BMP-2 gene were isolated in order to determine the 
transcriptional regulation of the BMP-2 gene in primary osteoblasts. 5 x 10 6 plaques were 
screened from a mouse genomic library, B6/CB A, (purchased from Stratagene, San Diego, 
CA) using BMP-2 cDNA as probe. The BMP-2 cDNA clone was isolated from a cDNA 
library of PC3 prostate cancer cells (Harris et al, 1 994). The human BMP-2 probe was a 
1 . lkb Sma l fragment containing most of the coding region. 

The BmP-2 genomic clones were sequenced by dideoxy chain termination method 
(Sanger et aJ 9 1977), using deoxyadenosine 5'-[a[ 35 SJthio] triphosphate and Sequenase 
(United States Biochemical, Cleveland, OH). All fragments were sequenced at least twice 
and overlaps were established using the appropriate oligonucleotie primer. Primers were 
prepared on an Applied Biosystems Model 392 DNA Synthesizer. Approximately 16kb of 
one of these BMP-2 clones was completely sequenced (Figure 9). Analysis of this 
sequence showed that the mouse BMP-2 gene contains one encoding and two coding exons 
(Feng et al t 1994). Analysis of the 5* flanking sequence showed that the BMP-2 gene does 
not contain typical TATA oar CAAT boxes. However, a number of putative response 
elements and transcription factor recognition sequences were identified upstream of exon 1 
(Figure 5). The 5'-flanking region is GC rich with several SP-1, AP-1 P53, E-box, 
homeobox, and AP-2 candidate DNA binding elements. 

(b) Analysis of Transcription Start Site for BMP-2 Gene. 

The transcription start sites for the BMP-2 gene were identified using the primer 
extension technique. Primer extension was carried out as described (Hall et al 9 1993). 
The primer used was a 32 p-labeled 18 mer oligonucleotide 5 '-CCCGGCAATTCAAGAAG- 
3 ' (SEQ. ID NO> 5). Total RNA obtained from primary fetal rat calvarial osteoblasts, was 
used for the primer extension. The results were shown in Figure 6. The major extension 
product was 68bp and was used to estimate the major transportation start site (+1, Figure 
5). These results were confirmed by Rnase protection assays. 

(c) Identification of BMP-2 Promoter and Enhancer 

Activity Using Luciferase (LUC) Reporter Gene Constructs. 

The BMP-2-LUC constructs (Figure 7) were designed to contain variable 5' 
boundaries from BMP-2 5' -flanking sequences spanning the transcription start site (+1); 
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Each construct contained the 3' boundary at +1 14 9 in exon 1 (Figure 6). These constructs 
were individually transfected into primary cultures of fetal rat calvarial osteoblasts, ROS 
17/2.8 osteosarcoma cells, HeLa cells, and CV-1 cells by the calcium-phosphate 
precipitation technique and the promoter activity for each of these constructs was assayed 
24 hrs following transfection by measuring the luciferase enzyme activity for each 
individual cell lysate. The LUC (luciferase enzyme assay) technique is described below 
under (0- Plasmid psvpGal was co-transfected with each plasmid construct to normalize 
for the transfection efficiency in each sample. The experiments were repeated at least five 
times in independent fetal rat calvarial cultures, with each assay done in triplicate. The 
mean values from a representative experiment are shown in Figure 8. 

(d) Isolation of Primary Fetal Rat Calvarial Osteoblasts for Functional Studies 
ofBMP-2 Gene Promoter. 

The cells used for transient transfection studies were isolated from 19 day-old fetal 
rat calvariae by sequential digestion with trypsin and collagenase, as described by Bellow et 
a/., (1986) and Harris et a/., (1994). In brief; the calvarial bone were surgically removed 
and cleaned by washing in a minimal essential media (aMEM) containing 10% V/V fetal 
calf serum (FCS) and antibiotics. The bones were minced with scissors and was transferred 
to 35 mm tissue culture dish containing 5 ml of sterile bacterial collagenase (0.1%) and ^ 
trypsin (0.05%). This was then incubated at 3 7°C for 20 min. The cells released at this 
time were collected and immediately mixed with an equal volume of FCS to inactivate 
trypsin. This procedure was repeated 6 times to release cells at 20 min intervals. Cells 
released from 3rd, 4th, 5th and 6th digestion (enriched for osteoblasts) were combined and 
the cells were collected by centrifugation at 400 g for 5 min. The cells were then plated in 
aMEM containing 10% FCS and antibiotics and were grown to confluency (2-3 days). At 
this stage the cells were plated for transfection in 60 mm tissue culture dishes at a cell 
density of 5 x 10 s cells per dish. These primary osteoblst cultures are capable f mineralized 
bone in prolonged cultures (Bellows et al, 1986; Harris et al, 1994). HeLa, ROS 17/2.8, 
and CV-1 cells were purchased from the ATCC. 

(e) Transient Transfection Assay. 

For transient transfection assay, the primary osteoblast cells were plated at the 
above mentioned cell density 1 8-24 hrs prior to transfection. The transfection was carried 
out using a modified calcium-phosphate precipitation method (Graham & van der Eb 1973; 
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Frost & Williams 1978). The cells were incubated for 4 hrs. at 37°C with 500fU of a 
calcium phosphate precipitate of plasmid DNA containing 1 Ong of reporter plasmid 
construct and 1 \ig of pSVPGal (for normalization of transfection efficiency) in 0.15M 
CaCl 2 and Hepes buffered saline (21mMHepes, 13.5mMNaCl, SmMKCl, 0.7mM 
5 Na2HP04, 5.5mM dextrose, pH 7.05-7. 1). After the 4 hr. incubation period of cells with 
precipitate, the cells were subjected to a 2 min treatment of 1 5% glycerol in aMEM, 
followed by addition of fresh aMEM containing insulin, transferrin and selenium (ITS) 
(Upstate Biotechnology Lake Placid, NY). The cells were harvested 24 hrs post 
transfection. 

10 (f) Luciferase and P-galactosidase Assay. 

Cells lysates were prepared and luciferase enzyme assay was carried out using assay 
protocols and the assay kit from Promega (Madison, WI). Routinely 20fil of cell lysate 
was mixed with 100^1 of luciferase assay reagent (270nM coenzyme A, 470jiM luciferin 
and 530pM ATP) and the luciferase activity was measured for 10 sec in a TURNER 
1 5 TD-20e luminometer. The values were normalized with respect to the P-galactosidase 
enzyme activity, obtained for each experimental sample 

The P-galactosidase enzyme activity was measured in the cell lysate using a 96 well 
microtker plate according to Rouet et al (1992). 10-20^1 cell lysate was added to 90- 
80nl P-galactosidase reaction buffer containing 88mM phosphate buffer, PH 7.3, 1 ImM 
. 20 KCL, ImM MgCl 2 , 55mM P mercaptoethanol, 4.4mM chlorophenol red 

p-D-galactopyranoside (Boehringer-Mannheim Corp., Indianapolis, IN). The reaction 
mixture was incubated at 37°C for 30-60 min, depending on transfection efficiency, and the 
samples were read with an ELIS A plate reader at 600nm. 

(g) Plasmid Construction 

25 The luciferase basic plasmid (pGL basic) was the vector used for all constructs 

(purchased from Promega, Madison, WI). Different lengths of DNA fragments from the 
BroP-2 5 '-flanking region were cloned at the multiple cloning sites of this plasmid, which is 
upstream of the firefly luciferase cDNA. The.BMP-2 DNA fragments were isolated either 
' by using available restriction enzyme sites (constructs -196/+ 1 14, -876/+1 14, -1995/+1 14, - 

30 2483/+ 1 14, and -2736/+1 14) or by polymerase chain reaction using specific oligonucleotide 
primers (constructs -23/+1 14, -123/+1 14 and +29/+1 14. 
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The minimal promoter activity for the BMP-2 gene was identified in the shortest 
construct containing 23bp upstream of the transcription start site (-23/+ 1 14). No luciferase 
activity was noted in the construct and did not include the transcription start site 
(+29/+ 1 14). Two other constructs containing increasing lengths of 5 ' sequences up to - 
5 196bp showed reproducible decreases in promoter activity in fetal rat calvarial osteoblasts 
and HeLa cells (Figure 8). The -876/+ 1 14 construct showed a 5-fold increase in activity in 
HeLa cells. The -1995/+1 14, -2483/+1 14 and -2736/+1 14 constructs showed decreased 
promoter activity when compared to the -876/+ 1 14 construct only in HeLa cells (Figure 8). 

In the primary fetal rat calvarial osteoblasts, the 2.6kb construct (-2483/+ 1 14) 
10 demonstrated a 2-3 -fold increase in luciferase activity over that of the -1995/+1 14 

construct (Figure 8). These results suggest that one or more positive response regions are 
present between -196 and -1995 and that the DNA sequence between -1995 and -2483bp 
was other positive regulatory elements that could modulate BMP-2 transcription. The 
largest 2.9kb construct (-2836/+ 1 14) repeatedly demonstrated a 20-50% decrease in 
1 5 promoter activity compared to the -2483/+ 1 1 4 construct, in these primary fetal rat calvarial 
osteoblasts (Figure 8). 

In ROS 17/2.8 osteosarcoma cells, the BMP-2 promoter activity was consistently 
higher than either the primary fetal rat calvarial osteoblasts or HeLa cells (Figure 8). All of 
the deletion constructs showed similar promoter activity in ROS 17/2.8 osteosreoma cells. 
The transformed state in ROS 17/2.8 cells may be responsible for the marked expression of 
the BMP-2 gene. ROS 17/2.8 cells represent a well differentiated osteosreoma and they 
produce high levels of BMP-2 mRNA. They form tumors in nude mice with bone-like 
material in the tumor (Majeska et al % 1 978; Majeska et al 9 1980). 

(h) Specificity of the BMP-2 Promoter. 

To analyze the activity of the BMP-2 promoter in cell types not expressing BMP-2 
mRNA, BMP-2 promoter constructs were transfected into CV-1 cells (monkey kidney 
cells). The BMP-2 promoter activity was found to be very low for all constructs. This 
suggests that this region of the BMP-2 promoter is functional only In cells such as primary 
fetal rat calvarial osteoblasts, HeLa and ROS 17/2.8 that express endogenous BMP-2 
mRNA (Anderson & Coulter 1968). CV-1 cells do not express BMP-2 mRNA. The 
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BMP-2 promoter is likely active in other cell types that express BMP-2, such as prostate 
cells and chondrocytes, although regulation of transcription may be different in these cells. 



Example 3 : USE OF PLASMTO CONSTRUCTS CONTAINING BMP 
PROMOTERS WITH REPORTER GENES TO IDENTIFY 
OSTEOGENIC AGENTS 

Plasmid constructs containing BMP promoters with reporter genes have been 
transfected into osteoblastic cells. The cells which have been utilized include primary 
cultures of fetal rat calvarial osteoblasts, cell lines obtained as gifts or commercially 
(MC3T3-E12 cells, MG-63 cells, U20S cells, UMR106 cells, ROS 17/2.8 cells, Sa)S2 
cells, and the like as provided in the catalog from the ATCC) and bone and cartilage cell 
lines established from transgenic mice. The bone cells are transfected transiently or stably 
with the plasmid constructs, exposed to the chemical compound, agent or factor to be 
tested for 48 hours, and then luciferase or CAT activity is measure in the cell lysates. 

Regulation of expression of the growth fector is assessed by culturing bone cells in 
aMEM medium with 10% fetal calf serum and 1% penicillin/streptomycin and 1% 
glutamine. The cells are placed in microtiter plates at a cell density of 5x 1 0 3 cells 
/lOOjil/weU. The cells are allowed to adhere and then incubated at 37°C at 5% C0 2 for 24 
hours and then the media is removed and replaced with 50^1 aMEM and 4% fetal calf 
serum, 50jxl aliquots containing the compound or factor to be tested in 0. 1% BS A solution 
is added to each well. The final volume is 100p.l and the final serum concentration is 2% 
fetal calf serum. Recombinant rat BMP-2 expressed in Chinese hamster ovarian cells is 
used as a positive control 

The treated cells are incubated at 37°C at 5% COj for 48 hours. The media is then 
removed and the cells are rinsed 3 times with phosphate buffered saline (PBS). Excess 
PBS is removed from the wells and IOOjil of cell culture lysing reagent (Promega #E153A) 
is added to each well. After 10 minutes, lOjil of the cell lysate is added to a 96-well white 
luminometric plate (Dynatech Labs #07100) containing 100^1 luciferase assay buffer with 
substrate (Promega #EI52A). The luciferase activity is read using a Dynatech ML2250 
automated 96-well luminometer. The data is expressed as either picograms of luciferase 
activity per well or picograms of luciferase per ng protein. 
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E2SJIlEl£4: DEMONSTRATION THAT BONE CELLS 

TRANSFECTED WITH BMP PROMOTERS CAN 
BE USED TO SCREEN FOR OSTEOGENIC AGENTS 

To demonstrate that the present invention is useful in evaluating potential 
osteogenic agents, a random array of chemical compounds from a chemical library obtained 
commercially was screened. It was found that approximately 1 in 100 such compounds 
screened produces a positive response in the present assay system compared with the 
positive control, recombinant BMP-2, which is known to enhance BMP-2 transcription. 
Compounds identified from the random library were subjected to detailed dose-response 
curves, to demonstrate that they enhance BMP messenger RNA expression, and that they 
enhance other biological effects ht vitro, such as expression of structural proteins including 
osteocalcin, osteopontin and alkaline phosphase, and enhance bone nodule formation in 
prolonged primary cultures of calvarial rodent osteoblasts. 

Compounds identified in this way can be tested for their capacity to stimulate bone 
formation in vitro in mice. To demonstrate this, the compound can be injected locally into 
subcutaneous tissue over the calvarium of normal mice and then the bone changes are 
followed histologically. It has been found that certain compounds identified by the present 
invention stimulate the formation of new bone in this in vivo assay system. 

The effects of compounds are tested in ICR Swiss mice, aged 4-6 weeks and 
weighing 13-26g. The compound at 20mg/kg or vehicle along (lOOul of 5% DMSO and 
phosphate-buffered 0.9% saline) are injected three times daily for 7 days. The injections 
are given into the subcutaneous tissues overlying the right side of the calvaria of five mice 
in each treatment group in each experiment. 

Mice are killed by either inhalation on day 14, i.e. 7 days after the last injection of 
compound. After fixation in 10% phosphate-buffered formalin, the calvariae are examined. 
The occipital bone is removed by cutting immediately behind and parallel to the lambdoid 
suture, and the frontal bone is removed by cutting anterior to the coronal suture using a 
scalpel blade. The bones are then bisected through the coronal plane and the 3- to 4mm 
strips of bone are decalcified in 14% EDTA dehydrated in graded alcohols, and embedded 
in paraffin. Four 3um thick nonconsecutive step sections are cut from each specimen and 
stained using hematoxylin and eosin. 

Two representative sections from the posterior calvarial strips are used. 
Histological measurements are carried out using a digitizing tablet and the Osteomeasure 
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image analysis system (Osteometries Inc., Atlanta, GA) on the injected and noninjected 
sides of the calvariae in a standard length of bone between the sagittal suture and the 
muscle insertion of the lateral border of each bone. Measurements consist of (1) Total 
bone area (i.e., bone and marrow between inner and outer periosteal surfaces); (2) Area of 
5 new woven bone formed on the outer calvarial surface; (3) The extent of osteoblast lined 
surface on the outer calvarial surface; (4) The area of the outer periosteum; and (5) The 
length of calvarial surface. From these measurements, the mean width of new bone and 
periosteum and the percentage of surface lined by osteoblasts on the outer calvarial surface, 
can be determined. 

10 By reference to the above disclosure and examples, it is seen that the present 

invention provides a new cell-based assay for identifying and evaluating compounds which 
stimulate the growth of bone. Also provided in accordance with the present invention are 
promoter regions of bone morphogenetic protein genes, and a system for identifying 
osteogenic agents utilizing such promoters operatively linked to reporter genes in 

15 expression vectors. 

The present invention provides the means to specifically identify osteogenic agents 
which stimulate bone cells to produce bone growth factors in the bone morphogenetic 
protein fcmily. These osteogenic agents are shown to be useful to increase the activity of 
the promoters of genes of members of the BMP family and other bone growth factors 
20 normally produced by bone cells. 



Examples : RESEQUENCES OF THE BMP-2 5 'FLANKING REGION 

The BMP-2 5' flanking region described in Example 2 was resequenced. The 
nucleotide sequence of the 5' flanking region of the mouse BMP-2 gene is provided in 
25 Figure 11. The sequence 'information in Figure 1 1 corrects sequencing errors that are 
present in Figures 5 and 9. The nucleotide sequence of Figure 1 1 replaces bases -2736 to 
+1 19 provided in Figure 5 and bases 1 to 2855 provided in Figure 9. The non-nucleotide 
sequence information provided in Figure 5 is applicable to the corresponding bases in 
Figure 1 1 where such bases are present. 
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All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application are [is] 
specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
5 illustration and example for purposes of clarity and understanding, it will be apparent to 
those of ordinary skill in the art in light of the teaching of this invention that certain changes 
and modifications may be made thereto without departing from the spirit or scope of the 
appended claims. 
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(B) STREET: 401 B. Street, Suite 1700 
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(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE : Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Weseman, James C. 

(B) REGISTRATION NUMBER: 30,507 

(C) REFERENCE /DOCKET NUMBER: P00060US0 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619) 699-3604 

(B) TELEFAX: 619-236-1048 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2310 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(A) NAME/KEY: 

(B) LOCATION: 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
GGGAGGAAGG GAAGAAAGAG AGGGAGGGAA AAGAGAAGGA 
GGTGCTGAGG GTGGGAAGGC AAGAGCGCGA GGCCTGGCCC 
ATCCGAGCTG AGAGACCCCA GCCTAAGACG CCTGCGCTGC 
TCTCCGTCCC TGATGGGATT CTCGTCTAAA CCGTCTTGGA 
TGGCCCTCGA CCAGGTTCAT TGCAGCTTTC TAGAGGTCCC 
CCCGCTTCTG CAGGAACCAA TGGTGAGCTC GAGTGCAGGC 
GTAGACGCTT GGGATCGCGC TTGGGGTCTC CTTTCGTGCC 
TTGCAACTCT GAGATCGTAA AAAAAATGTG ATGCGCTCTT 
GGAATCTGTC CGGAGTTAGA AGCTCAGACG TCCACCCCCC 
GCCTTGAATG GCACCGCCGA CCGGTTTCTG AAGGATCTGC 
GGTTGGCAGA CACGGTGTGG ATTTTAGOAG CCATTCCGTA 
CTGCCGCAGC TTCTCTGAGC CTTTCCAGCA AGTTTGTTCA 
TGGACTGTTA TTATGCCTTG TTTTCTGTCA GTGAGTCCAG 



GGT AAC CGA ATG CTG ATG GTC GTT TTA TTA TGC 
Gly Asn Arg Met Leu Met Val Val Leu Leu Cys 
5 10 



AGGAGTAGAT GTGAGAGGGT 

GGAAGCTAGG TGAGTTCGGC 

AACCCAGCCT GAGTATCTGG 

GCCTGCAGCG ATCCAGTCTC 

CAGAAGCAGC TGCTGGCGAG 

CGAAAGCTGT TCTCGGGTTT 

GGGTAGGAGT TGTAAAGCCT 

TCTTTGGCGA CGCCTG TTTT 

ACCCCCCGCC CACCCCCTCT 

TTGGCTGGAG CGGACGCTGA 

GTGCCATTCG GAGCGACGCA 

AGATTGGCTC CCAAGAATCA 

AGACACC ATG ATT CCT 
Met lie Pro 
1 

CAA GTC CTG CTA GGA 
Gin Val Leu Leu Gly 
15 



GGC GCG AGC CAT GCT ACT TTG ATA CCT GAG ACC 
Gly Ala Ser His Ala Ser Leu lie Pro Glu Thr 
20 25 30 

GCC GAG ATT CAG GGC CAC GCG GGA GGA CGC CGC 
Ala Glu lie Gin Gly Hie Ala Gly Gly Arg Arg 
40 ' 45 

GAG CTC CTG CGG GAC TTC GAG GCG ACA CTT CTA 
Glu Leu Leu Arg Asp Phe Glu Ala Thr Leu Leu 
55 60 

CGC CGC CGT CCG CAG CCT AGC AAG AGC GCC GTC 
Arg Arg Arg Pro Gin Pro Ser Lys Ser Ala Val 
70 75 

AGG GAT CTT TAC CGG CTC CAG TCT GGG GAG GAG 
Arg Asp Leu Tyr Arg Leu Gin Ser Gly Glu Glu 
85 90 



GGG AAG AAA AAA GTC 
Gly Lys Lys Lys Val 
35 

TCA GGG CAG AGC CAT 
Ser Gly Gin Ser His 
50 

CAG ATG TTT GGG CTG 
Gin Met Phe Gly Leu 
65 

ATT CCG GAT TAC ATG 
He Pro Asp Tyr Met 
80 

GAG GAG GAA GAG CAG 
Glu Glu Glu Glu Gin 
95 



60 
120 
ISO 
240 
300 
360 
420 
480 
540 
600 
660 
720 
776" 
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920 

968 

1016 
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AGC CAG GGA ACC GGG CTT GAG TAG CCG GAG CGT CCC GCC AGC CGA GCC 1112 
Ser Gin Gly Thr Gly Leu -Glu Tyr Pro Glu Arg Pro Ala Ser Arg Ala 
100 105 110 lis 

AAC ACT GTG AGG AGT TTC CAT CAC GAA GAA CAT CTG GAG AAC ATC CCA 1160 
Asn Thr Val Arg Ser Phe His His Glu Glu His Leu Glu Asn lie Pro 
120 125 130 

GGG ACC AGT GAG AGC TCT GCT TTT CGT TTC CTC TTC AAC CTC AGC AGC 1208 
Gly Thr Ser Glu Ser Ser Ala Phe Arg Phe Leu Phe Asn Leu Ser Ser 
135 140 145 

ATC CCA GAA AAT GAG GTG ATC TCC TCG GCA GAG CTC CGG CTC TTT CGG 1256 
lie Pro Glu Asn Glu Val lie Ser Ser Ala Glu Leu Arg Leu Phe Arg 
150 155 160 

GAG CAG GTG GAC CAG GGC CCT GAC TGG GAA CAG GGC TTC CAC CGT ATA 1304 
Glu Gin Val Asp Gin Gly Pro Asp Trp Glu Gin Gly Phe His Arg lie 
165 170 175 

AAC ATT TAT GAG GTT ATG AAG CCC CCA GCA GAA ATG GTT CCT GGA CAC 1352 
Asn lie Tyr Glu Val Met Lys Pro Pro Ala Glu Met Val Pro Gly His 
180 1B5 190 195 

CTC ATC ACA CGA CTA CTG GAC ACC AGA CTA GTC CAT CAC AAT GTG ACA 1400 
Leu He Thr Arg Leu Leu Asp Thr Arg Leu Val His His Asn Val Thr 
200 205 210 

CGG TGG GAA ACT TTC GAT GTG AGC CCT GCA GTC CTT CGC TGG ACC CGG 1448 
Arg Trp Glu Thr Phe Asp Val Ser Pro Ala Val Leu Arg Trp Thr Arg 
215 220 225 

GAA AAG CAA CCC AAT TAT GGG CTG GCC ATT GAG GTG ACT CAC CTC CAC 1496 
Glu Lys Gin Pro Asn Tyr Gly Leu Ala He Glu Val Thr His Leu His 
230 235 240 

CAG ACA CGG ACC CAC CAG GGC CAG CAT GTC AGA ATC AGC CGA TCG TTA 1544 
Gin Thr Arg Thr His Gin Gly Gin His Val Arg He Ser Arg Ser Leu 
245 250 255 

CCT CAA GGG AGT GGA GAT TGG GCC CAA CTC CGC CCC CTC CTG GTC ACT 1592 
Pro Gin Gly Ser Gly Asp Trp Ala Gin Leu Arg Pro Leu Leu Val Thr 
260 265 270 275 

TTT GGC CAT GAT GGC CGG GGC CAT ACC TTG ACC CGC AGG AGG GCC AAA 164 0 

Phe Gly His Asp Gly Arg Gly His Thr Leu Thr Arg Arg Arg Ala Lys 
280 285 290 

CGT AGT CCC AAG CAT CAC CCA CAG CGG TCC AGG AAG AAG AAT AAG AAC 1688 
Arg Ser Pro Lys His His Pro Gin Arg Ser Arg Lys Lys Asn Lys Asn 
295 300 305 

TGC CGT CGC CAT TCA CTA TAC GTG GAC TTC AGT GAC GTG GGC TGG AAT 1736 
Cys Arg Arg His Ser Leu Tyr Val Asp Phe Ser Asp Val Gly Trp Asn 
310 315 320 
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GAT TGG ATT GTG GCC CCA CCC GGC TAC CAG GCC TTC TAC TGC CAT GGG 1784 
Asp Trp lie Val Ala Pro Pro Gly Tyx Gin Ala Phe Tyr Cys His Gly 
325 330 335 

GAC TGT CCC TTT CCA CTG GCT GAT CAC CTC AAC TCA ACC AAC CAT GCC 1832 
Asp Cys Pro Phe Pro Leu Ala Asp His Leu Asn Ser Thr Asn His Ala 
340 345 350 355 

ATT GTG CAG ACC CTA GTC AAC TCT GTT AAT TCT AGT ATC CCT AAG GCC 18B0 
lie Val Gin Thr Leu Val Asn Ser Val Asn Ser Ser lie Pro Lys Ala 
360 365 370 

TGT TGT GTC CCC ACT GAA CTG AGT GCC ATT TCC ATG TTG TAC CTG GAT 1928 
Cys Cys Val Pro Thr Glu Leu Ser Ala He Ser Met Leu Tyr Leu Asp 
375 380 385 

GAG TAT GAC AAG GTG GTG TTG AAA AAT TAT CAG GAG ATG GTG GTA GAG 1976 
Glu Tyr Asp Lys Val Val Leu Lys Asn Tyr Gin Glu Met Val Val Glu 
390 395 400 

GGG TGT GGA TGC CGC TGAGATCAGA CAGTCCGGAG GGCGOACACA CACACACACA 2031 
Gly Cys Gly Cys Arg 
405 

CACACACACA CACACACACA CACACACACA CGTTCCCATT CAACCACCTA CACATACCAC 2091 

ACAAACTGCT TCCCTATAGC TGGACTTTTA TCTTAAAAAA AAAAAAAAGA AAGAAAGAAA 2151 

GAAAGAAAGA AAAAAAATGA AAGACAGAAA AGAAAAAAAA AACCCTAAAC AACTCACCTT 2211 

GACCTTATTT ATGACTTTAC GTGCAAATGT TTTGACCATA TTGATCATAT TTTGACAAAT 2271 

ATATTTATAA AACTACATAT TAAAAGAAAA TAAAATGAG 2310 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 408 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met He Pro Gly Asn Arg Met Leu Met Val Val Leu Leu Cys Gin Val 
15 10 is 

Leu Leu Gly Gly Ala Ser His Ala Ser Leu He Pro Glu Thr Gly Lys 
20 25 30 

Lys Lys Val Ala Glu He Gin Gly His Ala Gly Gly Arg Arg Ser Gly 
35 40 45 

Gin Ser His Glu Leu Leu Arg Asp Phe Glu Ala Thr Leu Leu Gin Met 
50 55 60 
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Phe Gly Leu Arg Arg Arg Pro Gin Pro Ser Lys Ser Ala Val lie Pro 
65 70 75 80 

Asp Tyr Met Arg Asp Leu Tyr Arg Leu Gin Ser Gly Glu Glu Glu Glu 
85 90 95 

Glu Glu Gin Ser Gin Gly Thr Gly Leu Glu Tyr Pro Glu Arg Pro Ala 
100 105 no 

Ser Arg Ala Asn Thr Val Arg Ser Phe Hie His Glu Glu His Leu Glu 
115 120 125 

Asn lie Pro Gly Thr Ser Glu Ser Ser Ala Phe Arg Phe Leu Phe Asn 
130 135 140 

Leu Ser Ser He Pro Glu Asn Glu Val He Ser Ser Ala Glu Leu Arg 
145 150 155 160 

Leu Phe Arg Glu Gin Val Asp Gin Gly Pro Asp Trp Glu Gin Gly Phe 
165 170 175 

Hie Arg He Asn He Tyr Glu Val Met Lys Pro Pro Ala Glu Met Val 
180 185 190 

Pro Gly His Leu He Thr Arg Leu Leu Asp Thr Arg Leu Val His His 
195 200 205 

Asn Val Thr Arg Trp Glu Thr Phe Asp Val Ser Pro Ala Val Leu Arg 
210 215 220 

Trp Thr Arg Glu Lys Gin Pro Asn Tyr Gly Leu Ala He Glu Val Thr 
225 230 235 240 

His Leu His Gin Thr Arg Thr His Gin Gly Gin His Val Arg He Ser 
245 250 255 

Arg Ser Leu Pro Gin Gly Ser Gly Asp Trp Ala Gin Leu Arg Pro Leu 
260 265 270 

Leu Val Thr Phe Gly His Asp Gly Arg Gly His Thr Leu Thr Arg Arg 
275 280 285 

Arg Ala Lys Arg Ser Pro Lys His His Pro Gin Arg Ser Arg Lys Lys 
290 295 300 

Asn Lys Asn Cys Arg Arg His Ser Leu Tyr Val Asp Phe Ser Asp Val 
305 310 315 320 

Gly Trp Asn Asp Trp He Val Ala Pro Pro Gly Tyr Gin Ala Phe Tyr 
325 330 335 

Cys His Gly Asp Cys Pro Phe Pro Leu Ala Asp His Leu Asn Ser Thr 
340 345 350 



Asn His Ala He Val Gin Thr Leu Val Asn Ser Val Asn Ser Ser He 
355 360 365 
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Pro Lys Ala Cys Cys Val Pro Tfar,Glu Leu Ser Ala lie Ser Met Leu 
370 -375 380 

Tyr Leu Asp Glu Tyr Asp Lys Val Val Leu Lys Asn Tyr Gin Glu Met 
385 390 395 400 

Val Val Glu Gly Cys Gly Cys Arg 
405 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2688 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG CTGCCTCTGT 60 

CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG GCTAGTTTGT ATCCATCTAA 120 

ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC AGAGATAACA GCTGGGTTTT CCCATCAAAC 180 

ACCTAGAAAT CCATTTTAGA TTCTAAATAG GGTTTGTCAG GTAGCTTAAT TAGAACTTTC 240 

AGACTGGGTT TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 300 

ATGAGACAAT AGCTGTTATT CAAACAACAT TTGGGTAAGG AAGAAAAATG AACAAACACC 360 

ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA GGCAAAGCTG CACCCCTAAG 420 

GACAACGAAT CGCTGCTGTT TGTGAGTTTA AATATTAAGG AACACATTGT GTTAATGATT 480 

GGAGCAGCAG TGATTGATGT AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT 540 

ATGGGAGCAC AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 600 

GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT GTTAGAAGCT 660 

GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT TGCCATACCT TGATTAATCG 720 

GAGATTAAAA GAGAAGGTGT ACTTAGAAAC GATTTCAAAT GAAAGAAGGT ATGTTTCCAA 780 

TGTGACTTCA CTAAAGTGAC AGTGACGCAG GGAATCAATC GTCTTCTAAT AGAAAGGGCT 840 

CATGGAGACC TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 900 

AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGGATGGTC AGGATGTTGT 960 

GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT CACCCTTTGT CTCTGGCCAG 1020 

TAGAATACAG GAACTCGTTC CTGTTTTTTT TTTTTTAAAT TCTGAAGGTG TGTAAGTACA 1080 
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AAGGTCAGAT GAGCGGCCCT AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC 114 0 

CACCCCAGAA ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 1200 

CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG GCTGTGATTT 1260 

CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT TCCCCATCCT CGTTTGTAAA 1320 

AACAAAGATG AAGCTGATAG TTCCTTCCCA GCTCCATCAG AGGCAGGGTG TGAAATTAGC 1360 

TCCTGTTTGG GAAGGTTTAA AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA 1440 

ACTCTTGTTT CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCC CTTCTTT 1500 

TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG GCACTGTGTT 1560 

ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA AATAACCATC TGGCAGTTAA 1620 

GAAGGCTTCA GAGATATAAA TAGGATTTTC TAATTGTCTT ACAAGGCCTA GGCTGTTTGC 1680 

CTGCCAAGTG CCTGCAAACT ACCTCTGTGC ACTTGAAATG TTAGACCTGG GGGATCGATG 1740 

GAGGGCACCC AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG AAACATCTCA 1800 

CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG AAGAGGTATC TCCTTCGGGC 1860 

ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC CTTTA ACAGT TTACGGAAGG 1920 

CCACCCTTTA AACCAATCCA ACAGCTCCCT TCTCCATAAC CTGATTTTAG AGGTGTTTCA 1980 

TTATCTCTAA TTACTCGGGG TAAATGGTGA TTACTCAGTG TTTTAATCAT CAGTTTGGGC 2040 

AGCAGTTATT CTAAACTCAG GGAAGCCCAG ACTCCCATGG . GTATTTTTGG AAGGTACAGA 2100 

GACTAGTTGG TGCATGCTTT CTAGTACCTC TTGCATGTGG TCCCCAGGTG AGCCCCGGCT 2160 

GCTTCCCGAG CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA ACTGAGGGCT GGGGAGCTGT 2220 

GCAATCTTCC OGACCCGGCC TTGCCAGGCG AGGCGAGGCC CCGTGGCTGG ATGGGAGGAT 2280 

GTGGGCGGGG CTCCCCATCC CAGAAGGGGA GGCGATTAAG GGAGGAGGGA AGAAGGGAGG 2340 

GGCCGCTGGG GGGAAAGACT GGGGAGGAAG GGAAGAAAGA GAGGGAGGGA AAAGAGAAGG 2400 

AAGGAGTAGA . TGTGAGAGGG TGGTGCTGAG GGTGGGAAGG CAAGAGCGCG AGGCCTGGCC 2460 

CGGAAGCTAG GTGAGTTCGG CATCCGAGCT GAGAGACCCC AGCCTAAGAC GCCTGCGCTG 2520 

CAACCCAGCC TGAGTATCTG GTCTCCGTCC CTGATGGGAT TCTCGTCTAA ACCGTCTTGG 2580 

AGCCTGCAGC OATCCAGTCT CTGGCCCTCG ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC 2640 

CCAGAAGCAG CTGCTGGCGA GCCCGCTTCT GCAGGAACCA ATGGTGAG 2688 
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(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2875 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(Dj TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTCAT TTCCACCACA 
AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC TTCAGAGAAA TTGGGCCAAA 
CTTGAGGAAA GTTCCTGGGA AAGGCTTTTT AGCAGCACCT CTCTGGGCTA CAAAAAAGAA 
GCCAGCAGGC ACCACCAAGG TGGAGTAACT GTCCAGAGGC ATCCATTTTA CCTCAGAGAC 
TTGATTACTA AGGATATCCT AAACGGCCAA ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 
AGCTGCAAGG CATTGTTGAT GTCATCACCA AAGGTTTCAT TTTCATCTTT TCTTGGGGTT 
GGTCCAACAG CTGTCAGCTT TCTCTTCCTC ATTAAAGGCA ACTTTCTCAT TTAAATCTCA 
TATAGGTTCG GAGTTTCTTG CTTTGCTCCT TCCGCCTCCG CGATGACAGA AGCAATGGTT 
AACTTCTCAA TTAAACTTGA TAGGGAAGGA AATGGCTTCA GAGGCGATCA GCCCTTTTGA 
CTTACACACT TACACGTCTG AGTGGAGTGT TTTATTGCCG CCTTGTTTGG TGTCTCATGA 
TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT ACAGTAGCTG ATCGCAAATT 
GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG ACAGCCTTCC TTCAAAAATA 
CCTTATTTGA CCTCTACAGC TCTAGAAACA GCCAGGGCCT AATTTCCCTC TGTGGGTTGC 
TAATCCGATT TAGGTGAACG AACCTAGAGT TATTTTAGCT AAAAGACTGA AAAGCTAGCA 
CACGTGGGTA AAAAAATCAT TAAAGCCCCT GCTTCTGGTC TTTCTCGGTC TTTGCTTTGC 
AAACTGGAAA GATCTGGTTC ACAACGTAAC GTTATCACTC TGGTCTTCTA CAGGAATGCT 
CAGCCCATAG TTTTGGGGGT CCTGTGGGTA GCCAGTGGTG GTACTATAAG GCTCCTGAAT 
GTAGGGAGAA ATGGAAAGAT TCAAAAAAGA ATCCTGGCTC AGCAGCTTGG GGACATTTCC 
AGCTGAGGAA GAAAACTGGC TTGGCCACAG CCAGAGCCTT CTGCTGGAQA CCCAGTGGAG 
AGAGAGGACC AGGCAGAAAA TTCAAAGGTC TCAAACCGGA ATTGTCTTGT TACCTGACTC 
TGGAGTAGGT GGGTGTGGAA GGGAAOATAA ATATCACAAG TATCGAAGTG ATCGCTTCTA 
TAAAGAGAAT TTCTATTAAC TCTCATTOTC CCTCACATGG ACACACACAC ACACACACAC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
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ACACACACAC ACACATCACT AGAAGGGATG TCACTTTACA AGTGTGTATC TATGTTCAGA 13 BO 

AACCTGTACC CGTATTTTTA TAATTTACAT AAATAAATAC ATATAAAATA TATGCATCTT 1440 

TTTATTAGAT TCATTTATTT GAATATAAAT GTATGAATAT TTATAAAATG TAATAATGCA 1500 

CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCATTCAAAA CAGAAGCGTT 1560 

TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT CTGTTCTTCT TTTTAATGTG 1620 

CTCTTACCTA AAAACTTCAA ACTCAAGTTG ATATTGGCCC AATGAGGGAA CTCAGAGGCC 1680 

AGTGGACTCT GGATTTGCCC TAGTCTCCCG CAGCTGTGGG CGCGGATCCA GGTCCCGGGG 1740 

GTCGGCTTCA CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 1800 

CTCCACCGCG GCCCCGTACG CGCCGTCCAC ACCCCTGCGC GCCCGTCCCG CCCGCCCGGG I860 

GGATCCCGGC CGTGCTGCCT CCGAGGGGGA GGTGTTCGCC ACGGCCGGGA GGGAGCCGGC 1920 

AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG CGCGCCAGCG CGGCTCGTCG CCGCCGGAGT 1980 

CCTCGCCCTG CCGCGCAOAG CCCTGCTCGC ACTGCGCCCG CCGCGTGCGC TTCCCACAGC 2040 

CCGCCCGGGA TTGGCAGCCC CGGACGTAGC CTCCCCAGGC GACACCAGGC ACCGGGACGC 2100 

CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG GGACTGGCAC GACACGGGTT 2160 

GGAACTCCAG ACTGTGCGCG CCTGGCGCTG TGGCCTCGGC TGTCCGGGAG AAGCTAGAGT 2220 

CGCGGACCGA CGCTAAGAAC CGGGAGTCCG GAGCACAGTC TTACCCTCAA TGCGGGGCCA 2280 

CTCTGACCCA GGAGTGAGCG CCCAAGGCGA TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT 2340 

GCCACAAAAG ACACTTGGCC CGAGGGCTCG GAGCGCGAGG TCACCCGGTT TGGCAACCCG 2400 

AGACGCGCGG CTGGACTGTC TCGAGAATGA GCCCCAGGAC GCCGGGGCGC CGCAGCCGTG 2460 

CGGGCTCTGC TGGCGAGCGC TGATGGGGGT GCGCCAGAGT CAGGCTGAGG GAGTGCAGAG 2520 

TGCGGCCCGC CCGCCACCCA AGATCTTCGC TGCGCCCTTG CCCGGACACG GCATCGCCCA 2580 

CGATGGCTGC CCCGAGCCAT GGGTCGCGGC CCACGTAACG CAGAACGTCC GTCCTCCGCC 2640 

CGGCGAGTCC CGGAGCCAGC CCCGCGCCCC GCCAGCGCTG GTCCCTGAGG CCGACGACAG 2700 

CAGCAGCCTT GCCTCAGCCT TCCCTTCCGT CCCGGCCCCG CACTCCTCCC CCTGCTCGAG 2760 

GCTGTGTGTC AGCACTTGGC TGGAGACTTC TTGAACTTGC CGGGAGAGTG ACTTGGGCTC 2820 

CCCACTTCGC GCCGGTGTCC TCGCCCGGCG GATCCAGTCT TGCCGCCTCC AGCCC 2875 
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(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCCGGCAAGT TCAAGAAG 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15144 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTCAT TTCCACCACA 
AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC TTCAGAGAAA TTGGGCCAAA 
CTTGAGGAAA GTTCCTGGGA AAGGCTTTTT AGCAGCACCT CTCTGGGCTA CAAAAAAGAA 
GCCAGCAGGC ACCACCAAGG TGGAGTAACT GTCCAGAGGC ATCCATTTTA CCTCAGAGAC 
TTGATTACTA AGGATATCCT AAACGGCCAA ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 
AGCTGCAAGG CATTGTTGAT GTCATCACCA AAGGTTTCAT TTTCATCTTT TCTTGGGGTT 
GGTCCAACAG CTGTCAGCTT TCTCTTCCTC ATTAAAGGCA ACTTTCTCAT TTAAATCTCA 
TATAGGTTCG GAGTTTCTTG CTTTCCTCCT TCCGCCTCCG CGATGACAGA AGCAATGGTT 
AACTTCTCAA TTAAACTTGA TAGGGAAGGA AATGGCTTCA GAGGCGATCA GCCCTTTTGA 
CTTACACACT TACACGTCTG AGTGGAGTGT TTTATTGCCG CCTTGTTTGG TGTCTCATGA 
TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT ACAGTAGCTG ATCGCAAATT 
GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG ACAGCCTTCC TTCAAAAATA 
CCTTATTTGA CCTCTACAGC TCTAGAAACA GCCAGGGCCT AATTTCCCTC TGTGGGTTGC 
TAATCCGATT TAGGTGAACG AACCTAGAGT TATTTTAGCT AAAAGACTGA AAAGCTAGCA 
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CACGTGGGTA AAAAAATCAT TAAAGCCCCT GCTTCTGGTC TTTCTCGGTC TTTGCTTTGC 900 

AAACTGGAAA GATCTGGTTC ACAACGTAAC GTTATCACTC TGGTCTTCTA CAGGAATGCT 960 

CAGCCCATAG TTTTGGGGGT CCTGTGGGTA GCCAGTGGTG GTACTATAAG GCTCCTGAAT 1020 

GTAGGGAGAA ATGGAAAGAT TCAAAAAAGA ATCCTGGCTC AGCAGCTTGG GGACATTTCC 1080 

AGCTGAGGAA GAAAACTGGC TTGGCCACAG CCAGAGCCTT CTGCTGGAGA CCCAGTGGAG 1140 

AGAOAGGACC AGGCAGAAAA TTCAAAGGTC TCAAACCGGA ATTGTCTTGT TACCTGACTC 1200 

TGGAGTAGGT GGGTGTGGAA GGGAAGATAA ATATCACAAG TATCGAAGTG ATCGCTTCTA 1260 

TAAAGAGAAT TTCTATTAAC TCTCATTGTC CCTCACATGG ACACACACAC ACACACACAC 1320 

ACACACACAC ACACATCACT AGAAGGGATG TCACTTTACA AGTGTGTATC TATGTTCAGA 1380 

AACCTGTACC CGTATTTTTA TAATTTACAT AAATAAATAC ATATAAAATA TATGCATCTT 1440 

TTTATTAGAT TCATTTATTT GAATATAAAT GTATGAATAT TTATAAAATG TAATAATGCA 1500 

CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCATTCAAAA CAGAAGCGTT 1560 

TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT CTG TTCTT CT TTTT AATGTG 1620 

CTCTTACCTA AAAACTTCAA ACTCAAGTTG ATAXTGGCCC AATGAGGGAA CTCAGAGGCC 1680 

AGTGGACTCT GGATTTGCCC TAGTCTCCCG CAGCTGTGGG CGCGGATCCA GGTCCCGGGG 1740 

GTCGGCTTCA CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 1800 

CTCCACCGCG GCCCCGTACG CGCCGTCCAC ACCCCTGCGC GCCCGTCCCG CCCGCCCGGG 1860 

GGATCCCGGC CGTGCTGCCT CCGAGGGGGA GGTGTTCGCC ACGGCCGGGA GGGAGCCGGC 1920 

AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG CGCGCCAGCG CGGCGTCGTC GCCGCCGGAG 1980 

TCCTCGCCCT GCCGCGCAGA GCCCTGCTCG CACTGCGCCC GCCGCGTGCG CTTCCCACAG 2040 

CCCGCCCGGG ATTGGCAGCC CCGGACGTAG CCTCCCCAGG CGACACCAGG CACCGGAGCC 2100 

CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG GGACTGGCAC GACACGGGTT 2160 

GGAACTCCAG ACTGTGCGCG CCTGGCGCTG TGGCCTCGGC TGTCCGGGAG AAGCTAGAGT 2220 

CGCGGACCGA CGCTAAGAAC CGGGAGTCCG GAGCACAGTC TTACCCTCAA TGCGGGGCCA 2280 

CTCTGACCCA GGAGTGAGCG CCCAAGGCGA TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT 2340 

GCCACAAAAG ACACTTGGCC CGAGGGCTCG GAGCGCGAGG TCACCCGGTT TGGCAACCCG 2400 

AGACGCGCGG CTGGACTGTC TCGAGAATGA GCCCCAGGAC GCCGGGGCGC CGCAGCCGTG 2460 

CGGGCTCTGC TGGCGAGCGC TGATGGGGGT GCGCCAGAGT CAGGCTGAGG GAGTGCAGAG 2520 

TGCGGCCCGC CCGCCACCCA AGATCTTCGC TGCGCCCTTG CCCGGACACG GCATCGCCCA 2580 

SUBSTITUTE SHEET (RULE 26) 



WO 96/38590 



-35- 



PCT/US96/08197 



CGATGGCTGC CCCGAGCCAT GGGTCGCGGC CCACGTAACG CAGAACGTCC GTCCTCCGCC 2640 

CGGCGAGTCC CGGAOCCAGC CCCGCGCCCC GCCAGCGCTG GTCCCTGAGG CCGACGACAG 2700 

CAGCAGCCTT GCCTCAGCCT TCCCTTCCGT CCCGGCCCCG CACTCCTCCC CCTGCTCGAG 2760 

GCTGTGTGTC AGCACTTGGC TGGAGACTTC TTOAACTTGC CGGGAGAGTG ACTTGGGCTC 2820 

CCCACTTCGC GCCGGTGTCC TCGCCCGGCG GATCCAGTCT TGCCGCCTCC AGCCCGATCA 2880 

CCTCTCTTCC TCAGCCCGCT GGCCCACCCC AAGACACAGT TCCCTACAGG GAGAACACCC 294 0 

GGAGAAGGAG GAOGAGGCGA AGAAAAGCAA CAGAAGCCCA GTTGCTGCTC CAGGTCCCTC 3000 

GGACAGAGCT TTTTCCATGT GGAGACTCTC TCAATGGACG TGCCCCCTAG TGCTTCTTAG 3060 

ACGGACTGCG GTCTCCTAAA GGTAGAGGAC ACGGGCCGGG GACCCGGGGT TGGCTGGCGG 3120 

GTGACACCGC TTCCCGCCCA ACGCAGGGCG CCTGGGAGGA CTGGTGGAGT GGAGTGGACG 3180 

TAAACATACC CTCACCCGGT GCACGTGCAG CGGATCCCTA GAGGGGTTAG GCATTCCAAA 3240 

CCCCAGATCC CTCTGCCTTG CCCACTGGCC TCCTTCCTCC AGCCGGTTCC TCCTCCCCAA 3300 

GTTTTCGATA CATTATAAGG GCTGTTTTGG GCTTTCAAAA AAAAAAATGC AGAAATCCAT 3360 

TTAAGAGTAT GGCCAGTAGA TTTTACTAGT TCATTGCTGA CCAGTAAGTA CTCCAAGCCT 3420 

TAGAGATCCT TGGCTATCCT TAAGAAGTAG GTCCATTTAG GAAGATACTA AAAGTTGGGG 3480 

TTCTCCATGT GTGTTTACTG ACTATGCGAA TGTGTCATAG CTTACACGTG CATTCATAAA 3540 

CACTATCTAT TTAGTTAATT GCAGGAAGGT GCATGGATTT CTTGACTGCA CAGGAGTCTT 3600 

GGGGAAGGGG GAACAGGGTT GCCTGTGGGT CAACCTTAAA TAGTTAGGGC GAGGCCACAA 3660 

CTTGCAAGTG GCGTCATTAG CAGTAATCTT GAGTTTAGCG CTTACTGAAT CTACAAGTTT 3720 

GATATGCTCA ACTACCAGGA AATTGTATAC AGCGCCTCTA AGGAAGTCAC TTGTGCATTT 3780 

GTGTCTGTTA ATATGCACAT GAGGCTGCAC TGTATAAGTT TGTCAGGGAT GCAGTGTCCG 3840 

ACCAACCTAT GGCTTCCCAG CTTCCTGACA CCCGCATTCC CAGCTAGTGT CACAAGAAAA 3900 

GGGTACAGAC GGTCAAGCTC TTTTTAATTG GGAGTTAAOA CCAAGCCCCA AGTAAGAAGT 3960 

CCGGCTGGGA CTTGGGGGTC CTCCATCGGC CAGCGAGCTC TATGGGAGCC GAGGCGCGGG 4020 

GGCGGCGGAG GACTGGGCGG GGAACGTGGG TGACTCACGT CGGCCCTGTC CGCAGGTCGA 4080 

CCATGGTGGC CGGGACCCGC TGTCTTCTAG TGTTGCTGCT TCCCCAGGTC CTCCTGGGCG 4140 

GCGCGGCCGG CCTCATTCCA GAGCTGGGCC GCAAGAAGTT CGCCGCGGCA TCCAGCCGAC 4200 

CCTTGTCCCG GCCTTCGGAA GACGTCCTCA GCGAATTTGA GTTX3AGGCTG CTCAGCATGT 4260 

TTGGCCTGAA GCAGAGACCC ACCCCCAGCA AGGACGTCGT GGTGCCCCCC TATATGCTAG 4320 
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ATCTGTACCG CAGGCACTCA GGCCAGCCAG GAGCGCCCGC CCCAGACCAC CGGCTGGAGA 4 3 BO 

GGGCAGCCAG CCGCGCCAAC ACCGTGCGCA CGTTCCATCA CGAAGGTGAG CGGGCGGCGG 4440 

GTGGCGGGGC GGGGACGGCG GGCGGGCGGA GACTAGGCGG GCAGCCCGGG CCTCCACTAG 4500 

CACAGTAGAA GGCCTTTCGG CTTCTGTACG GTCCCCTCTG TGGCCCCAGC CAGGGATTCC ,4560 

CCGCTTGTGA GTCCTCACCC TTTCCTGGCA AGTAGCCAAA AGACAGGCTC CTCCCCCTAG 4620 

AACTGGAGGG AAATCGAGTG ATGGGGAAGA GGGTGAGAGA CTGACTAGCC CCTAGTCAGC 4680 

ACAGCATGCG AGATTTCCAC AGAAGGTAGA GAGTTGGAGC TCCTTAAATC TGCTTGGAAG 4740 

CTCAGATCTG TGACTTGTGT TCACGCTGTA GTTTTAAGCT AGGCAGAGCA AGGGCAGAAT 4 BOO 

GTTCGGAGAT AGTATTAGCA AATCAAATCC AGGGCCTCAA AGCATTCAAA TTTACTGTTC 4860 

ATCTGGGCCT AGTTTGAAAG ATTTCTGAAT CCCTATCTAA TCCCCGTGGG AGATCAATTC 4920 

CACAATTCGT CATATTGTTT CCACAATGAC CTTCGATTCT TTGCTTAAAT CTTAAATCTC 4980 

CAAGTGGAGA CAGCGCAACG CTTCAGATAA AAGCCTTTCT CCCACTGCCT GCTACCTTCC 5040 

TAGGCAAGGC AATGGGGTTT TTAAACAAAT ATATGAATAT GATTTCCCAA GATAGAATAA 5100 

TGTTGTTTAT TTCAGCTGAA ATTTCCTGGA TTAGAAAGGC TGTAGAGGCC TATTGAAGTC 5160 

TCTTGCACCG ATGTTCTGAA AGCAGTTAGT AAAAAATCAT GACCTAGCTC AATTCTGTGT 5220 

GTGCCACTTT CAATGTGCTT TTGACTTAAT GTATTCTCCA TAGAACATCA GTTCCTTCAA 5280 

GTTCTAGAAG AATTCAGATT TAAAGTTTTG CTTTGCCTTG CTGAGGGGAT AAATTTTAAG 5340 

TAGAAATCTA GGCTCTGAAA TGATAGCCCA ACCCCATCTC CAGTAAGGGA TGACTGACTC 54 O0 

AAACCTTGAG AAGTCTGGGT GATAATAGGA AAAGTCCACA AGCAGGTCAC AGAGCGCGAG S460 

ATGGATCTGT CTTGAGGCAG CCAATGGTTA TGAAGGGCAC TGGAAATCCA T CTCTT TCAA 5520 

ACTGGTGTCT AGGGCTTTCT GGGAGCAAAG CTTAGACCAC ATTCTGCTCC TCAAGGTTTG 5580 

CCTACTGAAA GCAC3GGAGAT TCTGGGTGTT CACCCCCATC CTTCACCCCC AGGTGATTCT 5640 

GGGCTTAGCT AATCTCTCCT GGTTAATATT CATTGGAAAG TTTTTATAGA TCAAAACAAA 5700 

CAAACCTACT ATCCAGCACA GGTGTTTTTC CCACTGCCTC TGGAGATATA GCAAGAAAAC 5760 

CATATATTCA TGTATTTCCT TATTAGTCTT TTCTAACGTG AAAATTATTC CTGACCTATA 5820 

AAAAATGAAG GAGGTATTTT ATCTTAACTA AGCTAAAAGA ATCGCTTAAG TCAATTGAAA 5880 

CTCAAAAATC CAATTGAATG AAAGGTTCGT CAATAAAAAT CTACATTTTT CTTACTCTTC 5940 

CTTTGGAAAT AGCTTGATAA AAACACAGAC AAAACAAAGT CTGTGTGCTT ATTTGAAAAC 6000 

TTAGTOAOCT TCAGTTCATA AGCAAAAAAT GTAGTTTAAA AGTGATTTTT CTGTTGTAAA 6060 
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ACGTGATAGA AGTTATTGAC TTGTTTAAAA TAAACTTGCA CTAACTTTAT ACCTTGGTGC 6120 

AATTAGATGT AATGTTTACT GTAAATTTCA GGAAAACCAT TTTTTTTTTT TGGTCATGAT 6180 

CAGGTACACA TGGCATTTGG GAAGACTTTT CACATTGTTG AGTAACCTAG AGTTTGTTTG 6240 

TTTGTTTGTT TGTTTTTAAG CATTCTTGTG CCACTAGAAA AACCTTAATA AGCCATGTGT 6300 

TACTTGGTAG ACTTCTTCCT AAGTTCTAGA AAGTGGCTTA ATGCCACGAT GAGACAAAAC 6360 

ATACCATAGT AGTCTTTCAA CCAGTGGCAG AGTCTTCCAG ACAAAATCTC CTGTTGAACA 6420 

TTAAGACCAT GGATTTTTAT CCAGGAGAGC CCAGGCTTTG CTGAATCACC ACCCTCCAAC 6480 

CCCACTCCAA GGTCACCGAA GGCCTCCCCA ACTGGCTGCC ATTGAGAAAC TGTTTGAAAT 6540 

TGATTGACTC CATTGGCCCT ACAGAGACTT CTCCTTTAGT GGCAGATCAT ATACTGAAGG 6600 

ATCCAAGCTT GCTCTTCTGA CTATGAAGAG CACAGTCTTT CTTTTTCTTT ATGGAATAAA 6660 

CAAACTATGT GGCCCTGTGA CTAAAGTTTT CAAAGAGGGA GAGATCCTGT TAGCAGAAGT 6720 

GCAACTGCCC AGAAACTAGC CACAGGCTAG GATATTCCAA AGTACAACTC TAAAGTATGG 6780 

TCCATCCTAA ATTCTAGCAT GGGGTTGAAT ACCGGCATCC AGGAATACTT CTCTCTACCT 6840 

CTGGCTATTG CAGTGAGATT ACGAAGACCC TGGGGGGAAA AACAGTTGCT TAGTTTACAG 6 900 

ATGTTCCTTG CCACAGATGT TCTCAGTATC TCTTGTTTGT CAGAGGATCC TTTCAATCCC 6960 

TCTTGACATT TCCAATCTGC TTTTGTCCTC TCTACATGTG CCTTGTGGCA TTTCGCTTGG 7020 

TCTTTAGAGA ATCCCTTTCT GGAGCTGCAG GTTCCCTTGT AGGATCTGTG TTCAGGAGAA 7080 

CAGGGACCTT GGCAGGTTAG TGACAACTAC CAAACCCTGC TTTCCTTCCC TGCCACTTCC 7140 

TTTGTTGCCT TAAAAATTAA ACCTTAACTC TCTGTGTCTA AACCTTTTCT TCTTCCTCTT 7200 

TGTCATTTAC TTTATTTATT TGTCATGTAC TTTATCCTGT AGAAAATCAC AGTGTGGCCC 7260 

AAAGCCCCTT GAATCTTGTT GCAGCGGTGA GATGCAGCTG CTGATCTGGA AT AG CCTTAG . 7320 

GCTGTGTGTT TGATCACAAT GCTTTCTGTC CAAAAGTGTG CAAATCCTCC AAGCTTAATG 7380 

ATAACTTTTG AAATGAAACT CACCCTACTT TAGGGCAAAC AAGTAGCCAC AGAGAGCAGG 7440 

ATCTAAACAA GGTCTGGTGT CCCATTTGGC TGTGTCCCTT CAATTTTCTG TTCATTTAGC 7500 

TCTGTCTGCA TCTAAAGGGT GCTGGGCAAT AAGTTTTGAT CTTCAGGGCA AAACTCAATC 7560 

TTCAGTTACC ATGGTATCAG GTACCAATTC CTAGTGATTT GTGCTATGGC TTAGGATTTG 7620 

ATTTCTCTCC TACATTAGGT AATATCTTTC AATGGCTAGA ACTTGGGCAT TGCAGTACAC 7680 

TCAAGTTAAC AGTTCTGTGA CCTAAGGAAG TCACATAACC TCTCTGAATT CTCTACTGTT 7740 

TCATTCACAA AATGGAGAAA ATCATGGCTC TTTCTTAATG TGCGAATTCA TAGAAAGGTG 7800 
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ATGACACCAG ATTTGGCAGA AGGAAGGAAA GGAAGGAAGG AAGAAAGAAA GAAAGAAAGA 7860 

AAGAAAGAAA GAAAGAAAGA AAGAAAGAAA GGAAGGAAGG GAGAGAGAGA GAAGGGAAGG 7920 

GAAAGGGAAA GGGAAAGGAA AGAAAAGAAA GGAAGGAAGA AAAGGAAGGA AGGAAGGAAA 7980 

GAAGGAAGGA AGGAAAAGAA AGAGAAGAAA GCATTCAGCA TATGAACTAA TGTTTCCTGG 8040 

TGACTTTTTA TATCATATCC TTGTTCTAGG AAGTGGCCCT AGCCATATCT TTTGGGTTAT 8100 

TTTGAGGTAG AGGATAATCA ACATAGTGTA GAACATTAAA TCTGGGTTTT GTTTCTAGAA 8160 

GAGGCTAGAA TGGCATGGCT GTCCCACTTG CTCCTCTTTC AGGCAGTATG GCAGCCACCA 8220 

TTCTCTCTGT AAGATCTAGG AGGCTGACAC TCAGGTTGGA GACAGGTCAG AATCCTGAAA 8280 

TCACTTAGCA AGTTCAGCTG ATTCAACAAG GGATATTTAC AGAGAATTAA CAGCTATTCC 8340 

AGCTTCCAAA AAGTGTACAT TACCTACTCT GTATTTTCAG AACCCCAGGT TTGCTGTGAT 84 00 

AATTTGGTAG AAGCCTTTTC CTGTAATTTT CTTTATTTAA AAGATATTTT CATTTTCCAC 8460 

CCTCAAGAAG AGGTTGAAAC TTGTCCCTTG AAGTAGAAGA GGTGTTGTGT GTCCTGACCC 852 0 

TGAGGAAGTT GGCCTTGTTG AGGTCTTCTG TAAATTCTTG AATTCTCTGT ATAATTTCAA 8580 

TGAATAGTCA TGTTTGATAC CTTGGTATAA AGGATGGGAT AAGATCTTTC AAGGCTTAGG 8640 

CTGATGGAAA CGCTGCTOAA AGACTAGAGA TTGCTCTTTC CTTTGGCATC TGTCTTGGGT 8700 

AGTAATATTG TTCTCTGTGA AGGCCCACTT ATTCTGTCTT GAAAATTCTT CTTACCTCCA 8760 

GAGTGATAGG CCACAGGGAG TACTGTTTCT ATGTTTGCAG TTGAAAGATG ACAATTTCAT 8820 

ATGGTCCAAA CTTGGCTTTA TTTCTTGGTG AGATATTATT CTGTTACTTC AATGACCTGT 8 680 

CTCCATTATT TATCTTGAGG CTCACCTCTT CCCTTTTGTT GACTGTTGTG CAATTTGTGG 8940 

AAGGCCCTGG GTAGTCAGCC TTTATACTCT GTCTGTACAG GAAATAAAGT GCATGTCACC 9000 

* 

ATGCCAAAGT CAGGAGATGC CGGTGTGATT AGGGTCCACG GGATTTTGCT ACTGTTTTTA 9060 

TTTCTATCGA TGAATTGCCT TAGGCAGAAA CATTAAGGGA CACCAGAATG GTGATGAAAG 9120 

GCTTTTTATA ACAGAAGCTA AATGCAGTCC TTCATACTTC ATGGAATGCC CCTGTCCTAA 9180 

AGTACCATTA ACCGATAGTG GAGTCAGAAC ATAAATGGCT CCCCAAAGGT ATCACCAAGA 9240 

ACTTTTGGCA AACAGATGCA AGAGGATTAT GAAGAATCGC AGCTTGGTCT GGTAATCTTC 9300 

CTGTTGCAAA GAGAAGAGCT TTAGAAGACC CCCCTTGAGT CCCTGGCTGG CTTAACATAG 9360 

CATGAACCCT CATGTGTTGG CCAACATTAA GGCTTTTTCT ATAAAAGTCT CCTCCTTCAT 9420 

CAGTATACGC TCGAGTATGA AAAGCATCCT TTTAAACCTT GACTCTGTGT GGTCCAGAAA 9480 

CAGCAGCATC CCTTGCTTAA GAGCTTAATG GAGATGCAGG AGTGCAGGCC TCTTCCCAGA 954 0 
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CCGGCTGATG TGCAGGTCAA AGTCTAAGCA CTGCTGOATC AACACAGAAG TTATTCCGAA 9500 

TGAGCATGAG ATGGATACGA GAGAACAGGA AGTAGGAAGG G ATTTCTTTA TCGTGAATTG 9660 

CTACAGCAGC CTAATGTCAC CCCATACCCT TCTGAAGAAC TATGTCCCTG TGGATGCCTT 9720 

TGTCTCTAGA GTTCTGAGCA AAATGGTAGG GTGTGCTTTG CAAAATGTCA TCATTGATGT • 9780 

TGAATTTCAA AGTCTTTAAT TAAGGGGCTG AAATCTGTAT ATTGAGATTT GTAAATCATC 984 0 

TAAATTGTAG AGTAATGTTT GCACAGGCTG CTTAAGGGAT TGACATTAAA GCTCGTTTTC 9900 

TTAGTTAAGA AATACAGTCA TTTCCTCAAC TCCTCAGTCA TTAGCTCTCT ACTAAGTACA 996 0 

GTGCTGACTT TTTTAAAATT AAAGTCTGTG AATTCCAAAG AAGTGTTTCA CTATTTCCTC 10020 

CATTATTATA GCTACCTAGA AGCTATGTTC ATATATTGGA TTAAAAACGT AGCAATTACA 10080 

AAGTTAATGT GGCCATATAG AAAAGGGAAA AGAAACTCCG CTTTCACTTT AATATATATA 10140 

TGTGTGTGTG TATATCATAT ATATACATGT TGTGTGTGTA TATATATATA TATATATATA 10200 

TATATATATA TATATATATA TATATATATA TGTTGTGTTA AGCAGTAAAC TCAGGCCATG 10260 

GACAGAGGGG CAOACATTGT ATCTCTAGGC CTGACATTTT TAATTTCTGG TTGCAGGTTT 10320 

TTATGTAGTT TAACTTAAAC CATGCACTGA AGTTTTAAAT GCTCGTAAGG AATTAAGTTA 10380 

CCATTGGCTC TCTTACCAAA TGCGTTTCTT TTTTCTCTCC ACCCTGATCA AACTAGAAGC 10440 

CGTGGAGGAA CTTCCAGAGA TGAGTGGGAA AACGGCCCGG CG CTTCTT CT TCAATTTAAG 10500 

TTCTGTCCCC AGTGACGAGT TTCTCACATC TGCAGAACTC CAGATCTTCC GGGAACAGAT 10560 

ACAGGAAGCT TTGGGAAACA GTAGTTTCCA GCACCGAATT AATATTTATG AAATTATAAA 10620 

GCCTGCAGCA GCCAACTTGA AATTTCCTGT GACCAGACTA TTGGACACCA GGTTAGTGAA 10680 

TCAGAACACA AGTCAGTGGG AGAGCTTCGA CGTCACCCCA GCTGTGATGC GGTGGACCAC 10740 

ACAGGGACAC ACCAACCATG GGTTTGTGGT GGAAGTGGCC CATTTAGAGG AGAACCCAGG 10800 

TGTCTCCAAG AGACATGTGA GGATTAGCAG GTCTTTGCAC CAAGATGAAC ACAGCTGGTC 10860 

ACAGATAAGG CCATTGCTAG TGACTTTTGG ACATGATGGA AAAGGACATC CGCTCCACAA 10920 

ACGAGAAAAG CGTCAAGCCA AACACAAACA GCGOAAGCGC CTCAAGTCCA GCTGCAAGAG 10980 

ACACCCTTTG TATGTGGACT TCAGTGATGT GGGGTGGAAT GACTGGATCG TGGCACCTCC 11040 

GGGCTATCAT GCCTTTTACT GCCATGGGGA GTGTCCTTTT CCCCTTGCTG ACCACCTGAA 11100 

CTCCACTAAC CATGCCATAG TGCAGACTCT GGTGAACTCT GTGAATTCCA AAATCCCTAA 11160 

GGCATGCTGT GTCCCCACAG AGCTCAGCGC AATCTCCATG TTGTACCTAG ATGAAAATGA 11220 

AAAGGTTGTG CTAAAAAATT ATCAGGACAT GGTTGTGGAG GGCTGCGGGT GTCGTTAGCA 11280 



SUBSTITUTE SHEET (RULE 26) 



WO 96/38590 



-40- 



PCT/US96/08197 



CAGCAAGAAT AAATAAATAA ATATATATAT TTTAGAAACA GAAAAAACCC TACTCCCCCT 11340 
GCCTCCCCCC CAAAAAAACC AGCTGACACT TTAATATTTC CAATGAAGAC TTTATTTATG 11400 
GAATGGAATG AAAAAAACAC AGCTATTTTG AAAATATATT TATATCGTAC GAAAAGAAGT 11460 
TGGGAAAACA AATATTTTAA TCAGAGAATT ATTCCTTAAA GATTTAAAAT GTATTTAGTT 11520 
GTACATTTTA TATGGGTTCA ACTCCAGCAC ATGAAGTATA AGGTCAGAGT TATTTTGTAT 1158 0 
TTATTTACTA TAATAACCAC TTTTTA GGGA AAAAAGATAG TTAATTGTAT TTATATGTAA 11640 
TCAGAAGAAA TATCGGGTTT GTATATAAAT TTTCCAAAAA AGGAAATTTG TAGTTTGTTT 11700 
TTCAGTTGTG TGTATTTAAG ATGCAAAGTC TACATGGAAG GTGCTGAGCA AAGTGCTTGC 11760 
ACCACTTGCT GTCTGTTTCT TGCAGCACTA CTGTTAAAGT TCACAAGTTC AAGTCCAAAA 11820 
AAAAAAAAAA AGGATAATCT ACTTTGCTGA CTTTCAAGAT TATATTCTTC AATTCTCAGG 11880 

AATGTTGCAG AGTGGTTGTC CAATCCGTGA GAACTTTCAT TCTTATTAGG GGGATATTTG 11940 

GATAAGAACC AGACATTACT GATCTGATAG AAAACGTCTC GCCACCCTCC CTGCAGCAAG 12000 

AACAAAGCAG GACCAGTGGG AATAATTACC AAAACTGTGA CTATGTCAGG AAAGTGAGTG 12060 

AATGGCTCTT GTTCTTXCTT AAGCCTATAA TCCTTCCAGG GGGCTGATCT GGCCAAAGTA 12120 

CTAAATAAAA TATAATATTT CTTCTTTATT AACATTGTAG TCATATATGT GTACAATTGA 12180 

TTATCTTGTG GGCCCTCATA AAGAAGCAGA AATTGGCTTG TATTTTGTGT TTACCCTATC 12240 

AGCAATCTCT CTATTCTCCA AAGCACCCAA TTTTCTACAT TTGCCTGACA CGCAGCAAAA 12300 

TTGAGCATAT GTTTCCTGCC TGCACCCTGT CTCTGACCTG TCAGCTTGCT TTTCTTT CCA 12360 

GGATATGTGT TTGAACATAT TTCTCCAAAT GTTAAACCCA TTTCAGATAA TAAATATCAA 12420 

AATTCTGGCA TTTTCATCCC TATAAAAACC CTAAACCCCG TGAGAGCAAA TGGTTTGTTT 12480 

GTGTTTGCAG TGTCTACCTG TGTTTGCATT TTCATTTCTT GGGTGAATGA TGACAAGGTT 12540 

GGGGTGGGGA CATGACTTAA ATGGTTGGAG AATTCTAAGC AAACCCCAGT TGGACCAAAG 12600 

GACTTACCAA TGAGTTAGTA GTTTTCATAA GGGGGCGGGG GGAGTGAGAG AAAGCCAATG 12660 

CCTAAATCAA AGCAAAGTTT GCAGAACCCA AGGTAAAGTT CCAGAGATGA TATATCATAC 12720 

AACAGAGGCC ATAGTGTAAA AAAATTAAAG AATGTCTGAT CAGCGTCTCA GCACATCTAC 12780 

CAATTGGCCA QATGCTCAAA CAOAGTOAAG TCAGATGAGG TTCTGOAAAG TGAGTCCTCT 12840 

ATGATGGCAG AGCTTTGGTG CTCAGGTTGG AAGCAAAACC TAGGGAGGGA GGGCTTTGTG 12900 

GCTGTTTOCA GATTGGGGAA TCCAGTGCTA GTTCCTGGCA GGOTTTCAGG TCAGTTTCCG 12960 

GAGTGTGTGT CCTGTAGCCC TCCGTCATGG TTGAAGCCCA GGTCTCACCT CCTCTCCTGA 13020 

SUBSTITUTE SHEET (RULE 26) 



WO 96/3*590 



-41- 



PCT/DS96/08197 



CCCGTGCCTT AGAACTGACT TGGAAAGCGG TGTGCTTACA QCAAGACAGA CTCTTATAAT 
TAAATTCTTC CCAAGGACCT CCGTGCAATG ACCCCAAGCA CACTTACCTT CGGAAACCTT 
AAGGTTCTGA AGATCTTGTT TTAAATGACT ACCCTGGTTA G CTTTT GATG TGTTCCTTAT 
CCCTTTAGTT GTTGCACAGG TAGAAACGAT TAGACCCAAC TATGGGTAGC CTTGTCCTCC 
TGGTCCTTCA GTCATTCTCT AATGTCTCTT GCTTGCCATG GGCACTGTAA CAAACTGCAA 
TCTTAACATC TTATAAAATG AATGAACCAC ATATTTACAT CTCCAAGTCC TCCAGATGGG 
AGTGCGATCA TTCCATAAGG ATCCCACCTT CTGGCAGGTC TATCCAGTAC ATATTTTATG 
CTTCATTGGT CTTGATTTTC TTGGCTAAAA TTACTTGTAG CACAG CAGGC CCCATGTGAC 
ATATAGGTAT ATACATACAT GTATGTGCAT ATAGTGTGTA CATGTTCTAA TTTATACATA 
GCTATGTGAA GATTATGTTA CATATGTAGA TGGTCGCACT TCTGATTTCC ATTTAGGTTC 
AGAGAGAGAC GTCACAGTAA ATGGAGCTAT GTCATTGGTA TATCCCCGAG TGGTTCAGGT 
GTTCTCTCTA TTTTTTTAAG ATGGAGAACA CTCATCTGTA CTATCGAAAA CTGAGCCAAA 
TCACTTAGCA AATTTCTAGT CACTGCCTTG CTGTTAAGAT ACTGATTCAC TGGGTGCTGA 
CATGCTGAGC CCTGCCTACT TTTGCATGAA GGACAAGGAA GAGAGCTTGC AGTTAAGAAT 
GGTATATGTG GGGCTAGGGG GCGGCGTATA GACTGGCATA TATGTGAAGG AAGGTCACAA 
ACAGCCTGCA CTAATTTCCC TTTTCTGGTT TTATGTCTTG GCAGGGGAAA GGACAGGTAG 
GGTGGGGTTG AGGGGGAGGG CACACACATC TACTTGGATA AATTGCATCT CCTCTTTCCT 
TCACCCCGCC ACCATATCTT AAAGCCTTAT GACATCCTCT AGGGCAGAAT TTTCTCACCA 
GCTCCCCGCC CTACCAACTT CAAAGTGAAC TTCTAACTAA CTTGAGGGGC CAAAGTTCTA 
AATAAAACTT GTTAGAGTTT AGCGGGCACC TCAGTCATCA GGAATGCCTC CAGGAAAGCA 
AAAAGCTTGA TGTGTGTACA GCCACGTGGT GGAGTCCTGC CACCCTATGA TTCCTGTCCC 
AGTGGTCGTG TGGGGCCTGA GATCCTGAAT TTCTAATGAG CTCCCAGTAC GCCCTGACTC 
ACTGTGCCAG AGGACTGCAG TTTGAGTAGC AAGGTTGTGT GACTGTCTTC GATCATGGCT 
ACAGAAGCTG GCTCAAGTAC AG CCCTTCGT GTGTAAAAGC CATGTGTAAA TGAGAAGAAA 
CAGAAGGCAA AGCTGCGTTG CATGGCATCT GAATCAGTGC CCTGCAGTTT TGTTTTTTGT 

TTTTmrrr tcaaagacat tctttttccc aacaagatga gtggcaatct tatgttctag 

CCACTCTTAG ACATGAAAAC ACTGGGTTGC TTATCTTGTA AAATCTGCTC TGCTTGCTTG 
CTTGGGCACG CTGCAGTCAG TTTAGTCAAA TGCGTGTCAG TACATCTATA TGTATGAGGG 
AGCAGGTGCA AGTCCTTAGA AATGTACTTT AAAAAACTTG AACACTTAAG TCAGTGTGCT 
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GAGCTGCTCC TGTGTGATGT TAGGCCAAGC ACCTGAGTTA AAGGGATCTC TTTGAAGGCA 14820 

GAGGGTAGAT GTCGTATGGT TGAAQCATTT GTTTATACTA AAATGATGCT TGACTTTTTT 14880 

TCTAAGTTAT AAGACAGTAC ACTGTATAAG TTCATTGAAC CTAGAGGGTG GCATAGGACT 14940 

CCAAATCTGG TATGGGAGGT TTCTTCTAAT GGAAGTTCGA ATC TTTTTT G CAGTTGGCTT 15000 

GGAATAAAGT GCTTATGTGA ATGGGCTTAA GCTAGGGAAA AAAATGGGTT TCCCTCTGCA 15060 

AAGAGGGTCA GCACAGAAAT AACTTCCTGG CTTTGCTTGC ATGAATGCCA CTTGTTAGCA 15120 

GATGCCCTGT GGGGATCCGA ATTC 15144 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9299 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG CTGCCTCTGT £0 

CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG GCTAGTTTGT ATCCATCTAA 120 

ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC AGAGATAACA GCTGGGTTTT CCCATCAAAC 180 

ACCTAGAAAT CCATTTTAGA TTCTAAATAG GGTTTGTCAG GTAGCTTAAT TAGAACTTTC 240 

AGACTGGGTT TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 300 

ATGAGACAAT AG CTGTTATT CAAACAACAT TTGGGTAAGG AAGAAAAATG AACAAACACC 360 

ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA GGCAAAGCTG CACCCCTAAG 420 

GACAACGAAT CGCTGCTGTT TGTGAGTTTA AATATTAAGG AACACATTGT GTTAATGATT 480 

GGAGCAGCAG TGATTGATGT AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT 540 

ATGGGAGCAC AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 600 

GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT GTTAGAAGCT 660 

GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT TGCCATACCT TGATTAATCG 720 

GAGATTAAAA GAGAAGGTGT ACTTAGAAAC GATTTCAAAT GAAAGAAGGT ATGTTTCCAA 780 

TGTGACTTCA CTAAAGTGAC AGTGACGCAG GGAATCAATC GTCTTCTAAT AGAAAGGGCT 840 

CATGGAGACC TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 900 
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AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGOATGGTC AGGATGTTGT 960 

GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT CACCCTTTGT CTCTGGCCAG 1020 

TAGAATACAG GAACTCGTTC CTGTTTTTTT TTTTTTAAAT TCTGAAGGTG TGTAAGTACA 1080 

AAGGTCAGAT GAGCGGCCCT AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC 1140 

CACCCCAGAA ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 1200 

CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG GCTGTGATTT 1260 

CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT TCCCCATCCT CGTTTGTAAA 1320 

AACAAAGATG AAGCTQATAG TTCCTTCCCA GCTCCATCAG AGGCAGGGTG TGAAATTAGC 1380 

TCCTGTTTGG GAAGGTTTAA AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA 1440 

ACTCTTGTTT CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT 1500 

TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG GCACTGTGTT . 1560 

ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA AATAACCATC TGGCAGTTAA 1620 

GAAGGCTTCA GAGATATAAA TAGGATTTTC TAATTGTCTT ACAAGGCCTA GGCTGTTTGC 1680 

CTGCCAAGTG CCTGCAAACT ACCTCTGTGC ACTTGAAATG TTAGACCTGG GGGATCGATG 1740 

GAGGGCACCC AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG AAACATCTCA 1800 

CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG AAGAGGTATC TCCTTCGGGC 1860 

ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC CTTTAACAGT TTACGGAAGG 1920 

CCACCCTTTA AACCAATCCA ACAGCTCCCT TCTCCATAAC CTOATTTTAG AGGTGTTTCA 1980 

TTATCTCTAA TTACTCGGGG TAAATGGTGA TTACTCAGTG TTTTAATCAT CAGTTTGGGC 2 040 

AGCAGTTATT CTAAACTCAG GGAAGCCCAG ACTCCCATGG GTATTTTTGG AAGGTACAGA 2100 

GACTAGTTGG TGCATGCTTT CTAGTACCTC TTGCATGTGG TCCCCAGGTG AGCCCCGGCT 2160 

GCTTCCCGAG CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA ACTGAGGGCT GGGGAGCTGT 2220 

GCAATCTTCC GGACCCGGCC TTGCCAGGCG AGGCGAGGCC CCGTGGCTGG ATGGGAGGAT 2280 

GTGGGCGGGG CTCCCCATCC CAGAAGGGGA GGCGATTAAG GGAGGAGGGA AGAAGGGAGG 2340 

GGCCGCTGGG GGGAAAOACT GGGGAGGAAG GGAAGAAAGA GAGGGAGGGA AAAGAGAAGG 2400 

AAGGAGTAGA TGTGAGAGGG TGGTGCTGAG GGTGGGAAGG CAAGAGCGCG AGGCCTGGCC 2460 

CGOAAGCTAG GTGAGTTCGG CATCCGAGCT GAGAGACCCC AGCCTAAGAC GCCTGCGCTG 2520 

CAACCCAGCC TGAGTATCTG GTCTCCGTCC CTGATGGGAT TCTCGTCTAA ACCGTCTTGG 2580 

AGCCTGCAGC GATCCAGTCT CTGGCCCTCG ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC 2640 
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CCAGAAGCAG CTOCTGGCGA OCCCGCTTCT GCAGGAACCA ATGGTGAG CA GGGCAACCTG 2700 

GAGAGGGGCG CTATTCTGAG GATTCGAGGT GCACCCGTAG TAGAAGCTGG GGATGGGGCT 2760 

CAGGCTGTAA CCGAGGCAAA AGTTGGCCTA TTCCTCCTTC CTTCTCCAAC AGTGTTGGAG 2820 

GTGGGATGAT GGAGGCTAAA AGGCACCTCC ATATATGTTA CTGCGTCTAT CAACCTACTT 28 80 

TAGGGAGGTG CGGGCCAGGA GAGGCGGGAA GGAGAGAAGG CCTTGGAAGA GAGGTCATTG 2940 

GGAAGAACTG TGGGGTTTGG TGGGTTTGCT TCCACTTAGA CTATAAGAGT GGGAGAGGAG 3000 

GGAGTCAACT CTAAGTTTCA ACACCAGTGG GGGACTGAGG ACTGCTTCAT TAGGAGAGAG 3060 

AACCTAGCCA GAGCTAGCTT TGCAAAAGAG GCTGTAGTCC TGCTTTGCTC TAAAGCGCGA 3120 

CCCGGGATAG AGAGGCTTCC TTGAGCGGGG TGTCACCTAA TCTTGTCCCC AACGCACCCC 3X8 0 

CTCCCAGCCC CTGAGAGCTA GCGAACTGTA GGTACACAAC TCGCTCCCAT CTCCAGGAGC 3240 

TATTTTCTTA GACATGGGCA CCCATGATTC TGCCTTCTGG TACTCTCCCC TCCCTGGGAA 3300 

AGGGGTGTAA GGTTCCGACG GAACCGTGGC CAGGATGCCG AAAGGCTACC TGTGCGGGTC 3360 

TTCTGCCATG CTGTGTCTGT GCGGACATGC CAGCAGGGCT AATGAGGAGC TTGCGATACT 342 0 

CCAAAGGGTT CGGGAATTGC GGGGTCCTTA CACGCAGTGG AGTTGGGCCC CTTTTACTCA 348 0 

GAAGGTTTCC GCCACGGCTT TGGTTGATAG TTTTTTTAGT ATCCTGGTTT ATGAACTGAA 354 0* 

GGTTTTGTGA GATGTTGAAT CACTAGCAGG GTCATATTTG GCAAACCGAG GCTACTATTA 3600 

AATTTTGGTT TTAGAAGAAG ATTCTGGGGA GAAAGTGAAG GGTAACTGCC TCCAGGAGCT 3660 

GTATCAACCC CATTAAGAAA AAAAAAAATA CCAGGAGATG AAAATTTACT TTGATCTGTA 3720 

TTTTTTAATT AAAAAAAATC AGGGAAGAAA GGAGTGATTA GAAAGGGATC CTGAGCGTCG 378 0 

GCGGTTCCAC GGTGCCCTCG CTCCGCGTGC GCCAGTCGCT AGCATATCGC CATCTCTTTC 3840 

CCCCTTAAAA GCAAATAAAC AAATCAACAA TAAGCCCTTT GCCCTTTCCA GCGCTTTCCC 3900 

AGTTATTCCC AGCGGCGACG CGTGTCGGGG AATAGAGAAA TCGTCTCAGA AAGCTGCGCT 3960 

GATGGTGGTG AGAGCGGACT GTCGCTCAGG GGCGCCCGCG GTCTCTGCAC CCAGGGCAGC 4020 

AGTGTGGGAT GGCGCTGGGC AGCCACCGCC GCCAGGAAGG ACGTGACTCT CCATCCTTTA 4080 

CACTTCTTTC TCAAAGGTTT CCCGAAAGTG CCCCCCGCCT CGAAAACTGG GGCCGGTGCG 4140 

GGGGGGGGGA GAGGTTAGGT TGAAAACCAG CTGGACACGT- CGAGTTCCTA AGTGAGGCAA 4200 

AGAGGCGGGG TGGAGCGGGC TCTGGAGCGG GGGAGTCCTG GGACTCGGTC CTCGGATGGA 4260 

CCCCGTGCAA AGACCTGTTG GAACAAGAGT TGCGCTTCCG AGGTTAGAAC AGGCCAGGCA 4320 

TCTTAGGATA GTCAGGTCAC CCCCCCCCCC AACCCCACCC GAGTTGTGTT GGTGAATTTC 4380 
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TTGGAGGAAT CTTAGCCGCG ATTCTGTAGC TGGTGCAAAA GGAGGAAAGG GGTGGGGGAA 444 0 

GGAAGTGGCT GTGCGGGGGT GGCGGTGGGG GTGGAGGTGG TTTAAAAAGT AAGCCAAGCC 4500 

AGAGGGAGAG GTCGAGTGCA GGCCGAAAGC TGTTCTCGGG TTTGTAGACG CTTGGGATCG 4560 

CGCTTGGGGT CTCCTTTCGT GCCGGGTAGG AGTTGTAAAG CCTTTGCAAC TCTGAGATCG 4620 

TAAAAAAAAT GTGATGCGCT CTTTCTTTGG CGACGCCTGT TTTGGAATCT GTCCGGAGTT 4680 

AGAAGCTCAG ACGTCCACCC CCCACCCCCC GCCCACCCCC TCTGCCTTGA ATGGCACCGC 474 0 

CGACCGGTTT CTGAAGGATC TGCTTGGCTG GAGCGGACGC TGAGGTTGGC AGACACGGTG 4800 

TGGGGACTCT GGCGGGGCTA CTAGACAGTA CTTCAGAAGC CGCTCCTTCT AACTTTCCCA 4860 

CACCGCTCAA ACCCCGACAC CCCCGCGGCG GACTGAGTTG GCGACGGGGT CAGAGTCTTC 4920 

TGGCTGAAAG TTAGATCCGC TAGGGGTCGG CTGCCTGTCG CTAGAAGCAT TATTTGGCCT 4980 

CTCGGAGACC CGTGTGGAGG AAGTGCTGGA GTGTGCGAGT GTGTTTGCGT GTGTGTGTGT 5040 

GTGTGTGTGT GTGTGTGTGT GTGTGTGTGT GTGCGCGCGC CCTTGGAGGG TCCCTATGCG 5100 

CTTTCCTTTT CATGGAACGC TGTCGTGAGG CTTTGGTAAA CTGTCTTTTC GGTTCCTCTC 5160 

TCGGCTGCAC TTAAGCTTTG TCGGCGCTGT AAAGAGACGC GTCTTCAAGT GCACCCTGAT 5220 

CCTCAGGCTT CAGATAACCC GTCCCCGAAC CTGGCCAGAT GCATTGCACT GCGCGCCGCA 5280 

GGTAGAGACG TGCCCCACGT CCCCTGCGTG CAGCGACTAC GACCGAGAGC CGCGCCAGTG 5340 

TGGTGTCCCG CCGAGAGTTC CTCAGAGCAG GCGGGGACAA CTCCCAGACG GCTGGGGCTC 5400 

CAGCTGCGGG CGCGGAGGTT GGCCTCGCTC GCAGGGGCTG GACCCAGCCG GGGTGGGAGG 5460 

ATGGAGGAGG GGCGGGCGGG CTCTTCGGTG AGTGGGGCGG GGCCTCTGGG TCCACGTGAC 5520 

TCCTAGGGGC TGGAAGAAAA ACAGAGCCTG TCTGCTCCAG AGTCTCATTA TATCAAATAT 5580 

CATTTTAGGA GCCATTCCGT AGTGCCATTC GGAGCGACGC ACTGCCGCAG CTTCTCTGAG 5640 

CCTTTCCAGC AAG TTTGTT C AAGATTGGCT CCCAAGAATC ATGGACTGTT ATTATGCCTT 5700 

GTTTTCTGTC AGTGAGTAGA CACCTCTTCT TTCCCTTCTT GGGATTTCAC TCTGTCCTCC 5760 

CATCCCTGAC CACTGTCTGT CCCTCCCGTC GGACTTCCAT TTCAGTGCCC CGCGCCCTAC 5820 

TCTCAGGCAG CGCTATGGTT CTCTTTCTGG TCCCTGCAAG GCCAGACACT CGAAATGTAC 5880 

GGGCTCCTTT TAAAGCGCTC CCACTGTTTT CTCTGATCCG CTGCGTTGCA AGAAAGAGGG 5940 

AGCGCGAGGG ACCAAATAGA TGAAAGGTCC TCAGGTTGGG GCTGTCCCTT GAAGGGCTAA 6000 

CCACTCCCTT ACCAGTCCCG ATATATCCAC TAGCCTGGGA AGGCCAGTTC CTTGCCTCAT 6060 

AAAAAAAAAA AAAAAAACAA AAAACAAACA GTCGTTTGGG AACAAGACTC TTTAGTGAGC 6120 
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ATTTTCAACG CAGCGACCAC AATGAAATAA ATCACAAAGT CACTGGGGCA GCCCCTTGAC 6180 

TCCTTTTCCC AGTCACTGQA CCTTGCTGCC CGGTCCAAGC CCTGCCGGCA CAGCTCTGTT €240 

CTCCCCTCCT CCTGTTCTTA ACCAGCTGGA AGTTGTGGAA ATTGGGCTGG AGGGCGGAGG 6300 

AAGGGCGGGG GTGGGGGGGT GGAGAAGGTG GGGGGGGGGG AGGCTGAAGG TCCGAAGTGA 6360 

AGAGCGATGG CATTTTAATT CTCCCTCCGC CTCCCCCCTT TACCTCCTCA ATGTTAACTG 6420 

TTTATCCTTG AAGAAGCCAC GCTGAGATCA TGGCTCAGAT AGCCGTTGGG ACAGGATGGA 648 0 

GGCTATCTTA TTTGGGGTTA TTTGAGTGTA AACAAGTTAG ACCAAGTAAT TACAGGGCGA 6540 

TTCTTACTTT CGGGCCGTGC ATGGCTGCAG CTGGTGTGTG TGTGTGTAGG GTGTGAGGGA 6600 

GAAAACACAA ACTTGATCTT TCGGACCTGT TTTACATCTT GACCGTCGGT TGCTACCCCT 666 0 

ATATGCATAT GCAGAGACAT CTCTATTTCT CGCTATTGAT CGGTGTTTAT TTATTCTTTA 6720 

ACCTTCCACC CCAACCCCCT CCCCAOAGAC ACCATGATTC CTGGTAACCG AATGCTGATG 678 0 

GTCGTTTTAT TATGCCAAGT CCTGCTAGGA GGCGCGAGCC ATGCTAGTTT GATACCTGAG 6840 

ACCGGGAAGA AAAAAGTCGC CGAGATTCAG GGCCACGCGG GAGGACGCCG CTCAGGGCAG 6900 

AGCCATGAGC TCCTGCGGGA CTTCGAGGCG ACACTTCTAC AGATGTTTGG GCTGCGCCGC 6960 

CGTCCGCAGC CTAGCAAGAG CGCCGTCATT CCGGATTACA TGAGGGATCT TTACCGGCTC 702 b 

CAGTCTGGGG AGGAGGAGGA GGAAGAGCAG AGCCAGGGAA CCGGGCTTGA GTACCCGGAG 7080 

CGTCCCGCCA GCCGAGCCAA CACTGTGAGG AGTTTCCATC ACGAAGGTCA GTTTCTGCTC 714 0 

TTAGTCCTGG CGGTGTAGGG TGGGGTAGAG CACCGGGGCA GAGGGTGGGG GGTGGGCAGC 7200 

TGGCAGGGCA AGCTGAAGGG GTTGTGGAAG CCCCCGGGGA AGAAGAGTTC ATGTTACATC 7260 

AAAGCTCCGA GTCCTGGAGA CTGTGGAACA GGGCCTCTTA CCTTCAACTT TCCAGAGCTG 7320 

CCTCTGAGGG TACTTTCTGG AGACCAAGTA GTGGTGGTGA TGGGGGAGGG GGTTACTTTG 7380 

GGAGAAGCGG ACTGACACCA CTCAGACTTC TGCTACCTCC CAGTGGGTGT TCTTTAGCTA 7440 

TACCAAAGTC AGGGATTCTG CCCGTTTTGT TCCAAAGCAC CTACTGAATT TAATATTACA 7500 

TCTGTGTGTT TGTCAGGTTT ATCAATAGGG GCCTTGTAAT ACGATCTGAA TGTTTCCTAG 7560 

CGGATGTTTC TTTTCCAAAG TAAATCTGAG TTATTAATCC TCCAGCATCA TTACTGTGTT 7620 

GGAATTTATT TTCCCTTCTG TAACATGATC AACAAGGCGT GCTCTGTGTT TCTAGGATCG 76 BO 

CTGGGGAAAT GTTTGGTAAC ATACTCAAAA GTGGAGAGGG AGAGAGGGTG GCCCCTCTTT 7740 

TTCTTTACAA CCACTTGTAA AGAAAACTGT ACACAAAGCC AAGAGGGGGC TTTAAAAGGG 7800 

GAGTCCAAGG GTGGTGGAGT AAAAGAGTTG ACACATGGAA ATTATTAGGC ATATAAAGGA 7860 
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GGTTGGGAGA TACTTTCTGT CTTTOOTGTT TGACAAATGT GAGCTAAGTT TTGCTQGTTT 7920 

GCTAGCTGCT CCACAACTCT GCTCCTTCAA. ATTAAAAGGC ACAGTAATTT CCTCCCCTTA 7980 

GOTTTCTACT ATATAAGCAG AATTCAACCA ATTCTGCTAT TTTTTGTTTT TGTTTCTTGT 8040 

TTTTGTTTTG TTTGGTTTTT TTTTTTTTTT T TTTTTTTTT GTCTCAGAAA AGCTCATGGG 8100 

CCTTTTCTTT TCCCCTTTCA ACTGTGCCTA GAACATCTGG AGAACATCCC AGGGACCAGT 8160 

GAGAGCTCTG CTTTTCGTTT CCTCTTCAAC CTCAGCAGCA TCCCAGAAAA TGAGGTGATC 8220 

TCCTCGGCAG AGCTCCGGCT CTTTCGGGAG CAGGTGGACC AGGGCCCTGA CTGGGAACAG 8280 

GGCTTCCACC GTATAAACAT TTATGAGGTT ATGAAGCCCC CAGCAGAAAT GGTTCCTGGA 8340 

CACCTCATCA CACGACTACT GGACACCAGA CTAGTCCATC ACAATGTGAC ACGGTGGGAA 8400 

ACTTTCGATG TGAGCCCTGC AGTCCTTCGC TGGACCCGGG AAAAGCAACC CAATTATGGG 8460 

CTGGCCATTG AGGTGACTCA CCTCCACCAG ACACGGACCC ACCAGGGCCA GCATGTCAGA 8520 

ATCAGCCGAT CGTTACCTCA AGGGAGTGGA GATTGGGCCC AACTCCGCCC CCTCCTGGTC 8580 

ACTTTTGGCC ATGATGGCCG GGGCCATACC TTGACCCGCA GGAGGGCCAA ACGTAGTCCC 8640 

AAGCATCACC CACAGCGGTC CAGGAAGAAG AATAAGAACT GCCGTCGCCA TTCACTATAC 8700 

GTGGACTTCA GTGACGTGGG CTGGAATGAT TGGATTGTGG CCCCACCCGG CTACCAGGCC 8760 

TTCTACTGCC ATGGGGACTG TCCCTTTCCA CTGGCTGATC ACCTCAACTC AACCAACCAT 8820 

GCCATTGTGC AGACCCTAGT CAACTCTGTT AATTCTAGTA TCCCTAAGGC CTGTTGTGTC 8880 

CCCACTGAAC TGAGTGCCAT TTCCATGTTG TACCTGGATG AGTATGACAA GGTGGTGTTG 894 0 

AAAAATTATC AGGAGATGGT GGTAGAGGGG TGTGGATGCC GCTGAGATCA GACAGTCCGG 9000 

AGGGCGGACA CACACACACA CACACACACA CACACACACA CACACACACA CACGTTCCCA 9060 

TTCAACCACC TACACATACC ACACAAACTG CTTCCCTATA GCTGGACTTT TATCTTAAAA 9120 

AAAAAAAAAA GAAAGAAAGA AAGAAAGAAA GAAAAAAAAT GAAAGACAGA AAAGAAAAAA 9180 

AAAACCCTAA ACAACTCACC TTGACCTTAT TTATGACTTT ACGTGCAAAT GTTTTGACCA 9240 

TATTGATCAT ATTTTGACAA ATATATTTAT AACTACATAT TAAAAGAAAA TAAAATGAG 9299 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 8: 
CGGATGCCGA ACTCACCTA 19 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CTACAAACCC GAOAACAO 
(2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCCGQCACGA AAGGAGAC 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GAAGGCAAGA GCGCGAGG 
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Claims 

1. A system for identifying osteogenic agents comprising a recombinant host cell 
modified to contain an expression sequence comprising a promoter derived from a gene 
encoding a bone motphogenic protein operatively linked to a reporter gene encoding an 
assayable product 

2. The system of claim 1 wherein said bone morphogenic protein is selected from 
the group consisting of the BMP-2 and BMP-4 proteins. 

3. The system of claim 1 or 2 wherein said reporter gene comprises a gene 
encoding the production of an assayable product selected from the group consisting of 
firefly luciferase, chloramphenicol acetyl transferase, (J-galactosidase, green fluorescent 
protein, human growth hormone, alkaline phosphatase and P-glucuronidase. 

4. The system of claim 3 wherein said reporter gene comprises a gene encoding 
the production of firefly luciferase. 

5. A method for identifying an osteogenic compound comprising the steps of: 

culturing the cells of any of claim 1-4 under conditions which permit expression of 
said assayable product from said reporter gene; 

contacting said cells with at least one candidate compound suspected of possessing 
osteogenic activity; 

measuring the amount of assayable product produced in the presence of said 
candidate compound and comparing said amount to the amount of assayable product 
produced in the absence of said candidate compound; and 

identifying, as an osteogenic compound, a candidate compound that enhances the 
amount of said assayable product when present 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDKDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CCCGGTCTCA GGTATCA 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
CAGGCCGAAA GCTGTTC 
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6. An isolated nucleic acid molecule comprising a nucleotide sequence encoding 
the promoter region of a gene encoding bone morphogenetic protein selected from the 
group consisting of the BMP-2 and BMP-4 proteins. 

7. The nucleic acid molecule of claim 6 which corresponds to a nucleotide 
sequence selected from the group consisting of positions -2372 to +3 16 of the BMP-4 gene 
depicted in Figure 1C (SEQ ID NO:3), a portion thereof which encodes a biologically 
active promoter, the BMP-2 sequence depicted in Figure 1 1, and a portion thereof which 
encodes a biologically active promoter. 

8. A recombinant expression vector comprising the nucleotide sequence of claim 

6 or 7. 

9. The recombinant expression vector of claim 8 wherein said nucleotide 
sequence is operativcly linked to a reporter gene encoding an assayable product. 

10. The recombinant expression vector of claim 9 wherein said reporter gene 
comprises a gene encoding the production of an assayable product selected from the group 
consisting of firefly luciferase, chloramphenicol acetyl transferase, (J-galactosidase, green 
fluorescent protein, human growth hormone, alkaline phosphatase or ^-glucuronidase. 
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1 GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTCAT 
51 TTCCACCACA AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC 
"101 TTCAGAGAAA TTGGGCCAAA CTTGAGGAAA GTTCCTGGGA AAGGCTTTT^ 
151 AGCAGCACCT CTCTGGGCTA CAAAAAAGAA GCCAGCAGGC ACCACCAAGG 
201 TGGAGTAACT GTCCAGAGGC ATCCATTTTA CCTCAGAGAC TTGATTACTA 
251 AG GAT AT C CT AAACGGCCAA ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 

2 01 AGCTGCAAGG C ATTGTTG AT GTCATCACCA AAGGTTTCAT TTTCATCTTT 

3 51 TCTTGGGGTT GGTCCAACAG CTGTCAGCTT TCTCTTCCTC ATTAAAGGCA 

4 01 ACTTTCTCAT TTAAATCTCA TATAGGTTCG GAGTTTCTTG CTTTGCTCCT 
4 51 TCCGCCTCCG CGATGACAGA AGCAATGGTT AACTTCTCAA TTAAACTTGA 
501 TAGGGAAGGA AATGGCTTCA GAGGCGATCA GCCCTTTTGA CTTACACACT 
551 .TACACGTCTG AGTGGAGTGT TTTATTGCCG CCTTGTTTGG TGTCTCATGA 
601 TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT ACAGTAGCTG 
651 ATCGCAAATT GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG 
701 ACAGCCTTCC TTCAAAAATA CCTTATTTGA CCTCTACAGC TCTAGAAACA 
751 GCCAGGGCCT AATTTCCCTC TGTGGGTTGC TAATCCGATT TAGGTGAACG 
30 L AACCTAGAGT T ATTTT AG CT AAAAGACTGA AAAGCTAGCA CACGTGGGTA 
351 AAAAAATCAT TAAAGCCCCT GCTTCTGGTC TTTCTCGGTC TTTGCTTTGC 
901 AAACTGGAAA GATCTGGTTC ACAACGTAAC GTTATCACTC TGGTCTTCTA 
951 CAGGAATGCT CAGCCCATAG TTTTGGGGGT CCTGTGGGTA GCCAGTGGTG 

1001 GTACTATAAG GCTCCTGAAT GTAGGGAGAA ATGGAAAGAT TCAAAAAAGA 
1051 ATCCTGGCTC AGCAGCTTGG GGACATTTCC AGCTGAGGAA GAAAACTGGC 
1101 TTGGCCACAG CCAGAGCCTT CTGCTGGAGA CCCAGTGGAG AGAGAGGACC 
1151 AGGCAGAAAA TTCAAAGGTC TCAAACCGGA ATTGTCTTGT TACCTGACTC 
1201 TGGAGTAGGT GGGTGTGGAA GGGAAGATAA ATATCACAAG TATCGAAGTG 
1251 ATCGCTTCTA TAAAGAGAAT TTCTATTAAC TCTCATTGTC CCTCACATGG 
13 01 ACACACACAC ACACACACAC ACACACACAC ACACATCACT AGAAGGGATG 

13 51 TCACTTTACA AGTGTGTATC TATGTTCAGA AACCTGTACC CGTATTTTTA 

14 01 TAATTTACAT AAATAAATAC ATATAAAATA TATGCATCTT TTTATTAGAT 
14 51 TCATTTATTT GAATATAAAT GTATGAATAT TTATAAAATG TAATAATGCA 
1501 CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCATTCAAAA " 
1551 CAGAAGCGTT TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT 
1601 CTGTTCTTCT TTTTAATGTG CTCTTACCTA AAAACTTCAA ACTCAAGTTG:. 
1651 ATATTGGCCC AATGAGGGAA CTCAGAGGCC AGTGGACTCT -GGATTTGCCC 
1701 TAGTCTCCCG CAGCTGTGGG CGCGGATCCA GGTCCCGGGG GTCGGCTTCA 
1751 CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 
1301 CTCCACCGCG GCCCCGTACG CGCCGTCCAC ACCCCTGCGC. GCCCGTCCCG* 
1351 CCCGCCCGGG GGATCCCGGC CGTGCTGCCT CCGAGGGGGA GGTGTTCGCC 
1901 ACGGCCGGGA GGGAGCCGGC AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG 
19 51 CGCGCCAGCG CGGCGTCGTC GCCGCCGGAG TCCTCGCCCT GCCGCGCAGA 
2001 GCCCTGCTCG CACTGCGCCC GCCGCGTGCG CTTCCCACAG* CCCGCCCGGG" 
2051 ATTGGCAGCC CCGGACGTAG CCTCCCCAGG CGACACCAGG CACCGGAGCC 
2101 CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG. GGACTGGCAC 
2151 GACACGGGTT GGAACTCCAG ACTGTGCGCG CCTGGCGCTC TGGCCECGGC 
2201 TGTCCGGGAG AAGCTAGAGT* CGCGGACCGA CGCTAAGAAC CGGGAGTCCG 
2251 GAGCACAGTCT TTACCCTCAA TGCGGGGCCA CTCTGACCCA GGAGTGAGCG 
23 01 CCCAAGGCGA. TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT GCCACAAAAG 
23 51 acacttggcc; cgagggctcg; GAGCGCGAGG TCACCCGGTT TGGCAACCCG- 
2401 ag acgcgcgg: cxggacegtc: TCGAGAAIGA. gccccaggac gccggggcgc: 

2451 CGCAGCCGTC CGGGCTCTG CI TGGCGAGCGC TGATGGGGGT GCGCCAGAGTI 

2501- CAGGCTGAGG^ GAGTGCAGAG: TGCGGCCCGC CCGCCACCCA. AGATCEXCGC: 

2551 TG CGCCCTTG" CCCGGACACG" GCATCGCCCA. CGATGGCTGC CCCGAGCCATT 

2 601 GGGTCGCGGd CCACGTAACG* CAGAACGTCC GTCCTCCGCC CGGCGAGTCC 

2651 CGGAGCCAGC CCCGCGCCCC* GCCAGCGCTG GTCCCTGAGG CCGACGACAG 

2701 CAGCAGCCTT GCCTCAGCCT" TCCCTTCCGT CCCGGCCCCG CACTCCTCCC 

2751 CC7GCTCGAG GCTGTC-TGTC AGCACTTGGC TGGAGACTTC TTGAACTTGC 
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26 01 CGGGA'3AGTG ACTTGGGCTC CCCACTTCGC GCCGGTGTCC TCGCCCGGCG 
2S5L GATCCAGTCT TGCCGCCTCC AGCCCGATCA CCTCTCTTCC TCAGCCCCCT 
2901 GGCCCACCCC AAGACACAGT TCCCTACAGG GAGA-AC AC CC GGAGAAGGAG 

29 51 GAGGAGGCGA AGAAAAGCAA CAGAAGCCCA GTTGCTGCTC CAGGTCCCTC 

30 01 GGACAGAGCT TTTTCCATGT GGAGACTC7C TCAATGGACG TGCCCCCTAG 
3 051 TGCTTCTTAG ACGGACTGCG GTCTCCTAAA GGTAGAGGAC ACGGGCCGGG 
3101 GACCCGGGGT TGGCTGGCGG GTGACACCGC TTCCCGCCCA ACGCAGGGCG 
3151 CCTGGGAGGA CTGGTGGAGT GGAGTGGACG TAAACATACC CTCACCCGGT 

32 01 GCACGTGCAG CGGATCCCTA GAGGGGTTAG GCATTCCAAA CCCCAGATCC 
3 2 51 CTCTGCCTTG CCCACTGGCC TCCTTCCTCC AGCCGGTTCC TCCTCCCCAA 
3 3 01 GTTTTCGATA CATTATAAGG GCTGTTTTGG GCTTTCAAAA AAAAAAATGC 

33 51 AGAAATCCAT TTAAGAGTAT GG CCAGTAG A TTTTACTAGT TCATTGCTGA 
3 401 CCAGTAAGTA CTCCAAGCCT TAGAGATCCT TGGCTATCCT TAAGAAGTAG 

34 51 GTCCATTTAG GAAGATACTA AAAGTTGGGG TTCTCCATGT GTGTTTACTG 
3 501 ACTATGCGAA TGTGTCATAG CTTACACGTG C ATT CAT AAA CACTATCTAT 
3 551 TTAGTTAATT GCAGGAAGGT GCATGGATTT CTTGACTGCA CAGGAGTCTT 
3 6 01 GGGGAAGGGG GAACAGGGTT GCCTGTGGGT CAACCTTAAA TAGTTAGGGC 
3 651 GAGGCCACAA CTTGCAAGTG GCGTCATTAG CAGTAATCTT GAGTTTAGCG 
3 7 01 CTTACTGAAT CTACAAGTTT GATATGCTCA ACTACCAGGA AATTGTATAC 
3 751 AGCGCCTCTA AGGAAGTCAC TTGTGCATTT GTGTCTGTTA ATATGCACAT 

38 01 GAGGCTGCAC TGTATAAGTT TGTCAGGGAT GCAGTGTCCG ACCAACCTAT 
3 851 GGCTTCCCAG CTTCCTGACA CCCGCATTCC CAGCTAGTGT CACAAGAAAA 

3 9 01 GGGTACAGAC GGTCAAGCTC TTTTTAATTG GGAGTTAAGA CCAAGCCCCA 

39 51 AGTAAGAAGT CCGGCTGGGA CTTGGGGGTC CTCCATCGGC CAGCGAGCTC 

40 01 .TATGGGAGCC. GAGGCGCGGG GGCGGCGGAG GACTGGGCGG GGAACGTGGG 
40 51 TGACTCACGT CGGCCCTGTC CGCAGGTCGA CCATGGTGGC CGGGACCCGC 
4101 TGTCTTCTAG TGTTGCTGCT TCCCCAGGTC CTCCTGGGCG GCGCGGCCGG 
4151 CCTCATTCCA GAGCTGGGCC GCAAGAAGTT CGCCGCGGCA TCCAGCCGAC 

4 2 01 CCTTGTCCCG GCCTTCGGAA GACGTCCTCA GCGAATTTGA GTTGAGGCTG 
4 251 CTCAGCATGT TTGGCCTGAA GCAGAGACCC ACCCCCAGCA AGGACGTCGT 
43 01 GGTGCCCCCC TATATGCTAG ATCTGTACCG CAGGCACTCA GGCCAGCCAG " 

43 51 GkGCGCCCGC CCCAGACCAC CGGCTGGAGA GGGCAGCCAG CCGCGCCAAC 
4*4 01 ACCGTGCGCA CGTTCCATCA CGAAGGTGAG CGGGCGGCGG GTGGCGGGGC 

44 51 GGGGXCGGCG GGCGGGCGGA GACTAGGCGG GCAGCCCGGC CCTCCACTAG 
4 501 CACAGTAGAA GG CCTTTCGG CTTCTGTACG GTCCCCTCTG TGGCCCCAGC 
4 551 CAGGGATTCC CCGCTTGTGA GTCCTCACCC TTTCCTGGCA AGTAGCCAAA 
4 601 AGACAGGCTC CTCCCCCTAG AACTGGAGGG AAATCGAGTG ATGGGGAAGA 
4 651 GGGTGAGAGA CTGACTAGCC CCTAGTCAGC ACAGCATGCG" AGATTTCCAC 
4-701 AGAAGGTAGA GAGTTGGAGC TCCTTAAATC TGCTTGGAAG CTCAGATCTG 
4 7 51 TGACTTGTGTT TCACGCTGTA GTTTTAAGCT: AGGCAGAGCA AGGGCAGAAT 
4801 GTTCGGAGAT AGTATTAGCA AATCAAATCC" AGGGCCTCAA AGCATTCAAA 
4851 TTTACTGTTC ATCTGGGCCT AGTTTG AAAG' ATTTCTGAAT." CCCTATCTAA 

49 01 TCCCCGTGGG" AGATCAATTC CACAATTCGT CATATTGTTT" CCACAATGAC 
^9 51 CTTCGATTCT TTGCTTAAAT" CTTAAATCTC CAAGTGGAGA. CAGCGCAACGT 

50 01 CTTCAGATAA AAGCCTTTCT CCCACTGCCE G CT ACCTTCCT TAGGCAAGGC 
50 51 AATGGGGTTT TTAAACAAAT* ATATGAATAT." GATTTCCCAA GATAGAATAA 
5101 TGTTGTTTAT* TTCAGCTGAA ATTTCCTGGA TT AG AAAG GC TGTAGAGGCC 

5151 tattgaagtc: tcttgcaccg: atgttcigaa agcagttagt: AAAAAATCAT 
5201 GACcrAGcrc aattctgtgt: gtgccactttt caatgtgcttt txgacitaat: 

52 51 GTATTCTCCA. TAGAACATCA. GTTCCTTCAA1 GTTCTAGAAG.' AACTCAGATT 

53 01 TAAAGTTTTC CTCTGCCTTG; CTGAGGGGATT AAATTTTAAGT- TAGAAATCTA 

53 51 GG CTCTG AAA- TGATAGCCCA- ACCCCATCTCr CAGTAAGGGA" TGACTGACTC 
5401 AAACCTTGAGT AAGTCTGGGT" GATAATAGGA AAAGTCCACA. AGCAGGTCAC 

54 51 agagcgcgag: ATGGATCTGT CTTGAGGCAG CCAATGGTTA TGAAGGGCAC 
5501 ^GGAAATCCA TCTCTTTCAA ACTGGTGTCT AGGGCTTTCT GGGAGCAAAG 
is 51 CT^AGACCAC ATTCTGCTCC TCAAGGTTTG CCTACTGAAA GCAGGGAGAT 
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5601 TCTGGGTGTT CACCCCCATC CTTCACCCCC AGGTGATTCT GGGCTTAGCT 

56 51 AATCTCTCCT GGTTAATATT CATTGGAAAG TTTTTATAGA TCAAAACAAA 

57 01 CAAACCTACT ATCCAGCACA GGTGTTTTTC CCACTGCCTC TGGAGATATA 
57 51 GCAAGAAAAC CATATATTCA TGTATTTCCT TATTAGTCTT TTCTAACGTG 
5301 AAAATTATTC CTGACCTATA AAAAATGAAG GAGGTATTTT ATCTTAACTA 
5S51 AGCTAAAAGA ATCGCTTAAG TCAATTGAAA CTCAAAAATC CAATTGAATG 
59 01 AAAGGTTCGT CAATAAAAAT CTACATTTTT CTTACTCTTC CTTTGGAAAT 
5951 AGCTTGATAA AAACACAGAC AAAACAAAGT CTGTGTGCTT ATTTGAAAAC 
6001 TTAGTGAGCT TCAGTTCATA AGCAAAAAAT GTAGTTTAAA AGTG ATTTTT 
6051 CTGTTG T AAA ACGTGATAGA AGTTATTGAC TTGTTTAAAA TAAACTTG CA 
6101 CTAACTTTAT ACCTTGGTGC AATTAGATGT AATGTTTACT GTAAATTTCA 
6151 GGAAAACCAT TTTTTTTTTT TGGTCATGAT CAGGTACACA TGGCATTTGG 
6201 GAAGACTTTT CACATTGTTG AGTAACCTAG AGTTTGTTTG TTTGTTTGTT 

6 251 TGTTTTTAAG CATTCTTGTG CC ACT AG AAA AACCTTAATA AGCCATGTGT 
6301 TACTTGGTAG ACTTCTTCCT AAGTTCTAGA AAGTGGCTTA ATGCCACGAT 
63 51 GAGACAAAAC ATACCATAGT AGTCTTTCAA CCAGTGGCAG AGTCTTCCAG 
6401 ACAAAATCTC CTGTTGAACA TTAAGACCAT GGATTTTTAT CCAGGAGAGC 
6451 CCAGGCTTTG CTGAATCACC ACCCTCCAAC CCCACTCCAA GGTCACCGAA 
6501 GGCCTCCCCA ACTGGCTGCC ATTGAGAAAC TGTTTGAAAT TGATTGACTC 
6551 CATTGGCCCT ACAGAGACTT CTCCTTTAGT GGCAGATCAT ATACTGAAGG 
6601 ATCCAAGCTT GCTCTTCTGA CTATGAAGAG CACAGTCT^TT CTTTTTCTTT 
6651 ATGGAATAAA CAAACTATGT GGCCCTGTGA CTAAAGTTTT CAAAGAGGGA 
6701 GAGATCCTGT TAGCAGAAGT GCAACTGCCC AGAAACTAGC CACAGGCTAG 
6751 GATATTCCAA AGTACAACTC TAAAGTATGG TCCATCCTAA ATTCTAGCAT 
6301 GGGGTTGAAT ACCGGCATCC AGGAATACTT CT CT.CTACCT CTGGCTATTG 
63 51 C AGTG AG ATT ACGAAGACCC TGGGGGGAAA AACAGTTG CT TAGTTTACAG 
6901 ATGTTCCTTG CCACAGATGT TCTCAGTATC TCTTGTTTGT CAGAGGATCC 
6951 TTTCAATCCC TCTTGACATT TCCAATCTGC TTTTGTCCTC TCTACATGTG 
7001 CCTTGTGGCA TTTCG CTTGG TCTTTAGAGA ATCCCTTTCT GGAGCTGCAG 

7 051 GTTCCCTTGT AGGATCTGTG TTCAGGAGAA CAGGGACCTT GGCAGGTTAG 
7101 TGACAACTAC CAAACCCTGC TTTCCTTCCC TGCCACTTCC TTTGTTGCCT 
7151 TAAAAATTAA ACCTTAACTC. TCTGTGTCTA AACCTTTTCT TCTTCCTCTT 
7201 TGTCATTTAC TTT ATTTATT * TGTCATGTAC TTTATCCTGT AGAAAATCAC 
7251 AGTGTGGCCC: AAAGCCCCTT. GAATCTTGTT GCAGCGGTGA GATGCAGCTG 
73 01 CTGATCTGGA ATAGCCTTAG* G CIGTGTGTT* TGATCACAAT GCTTTCTGTC 

73 51 C AAAAGTGTG * CAAATCCTCC AAGCTTAATG ATAACTTTTG AAATGAAACT 

74 01 CACCCTACTT TAGGGCAAAC* AAGTAGCCAC AGAGAGCAGC ATCTAAACAA 
7 451 GGTCTGGTGT CCCATTTGGC TGTGTCCCTTL CAATTTTCTG TTCATTTAGC 
750 L TCTGTCTGCA TCTAAAGGGT GCTGGGCAAT" AAGTTTTGAT" CTTCAGGGCA 
7 551. AAACTCAATC: TTCAGTTACC ATGGTATCAG. GTACCAATTC CTAGTGATTT" 
T6 01 GTGCTATGGC TTAGGATTTG* ATTTCTCTCC TACATTAGGT AATATCTTTC 
7651 AATGGCTAGA. ACTTGGGCAT" TG CA GT AC AC TCAAGTTAAC AGTTCTGTGA 
7701 CCTAAGGAAG* TCACATAACC TCTCTGAATT CTCTACTGTT* TCATTCACAA 
775 L AATGGAGAAA ATCATGGCTC TTTCTTAATG* TGCGAATTCA TAGAAAGGTG* 
7501 ATGACACCAG. ATTTGGCAGA, AGGAAGGAAA. GG AAGGAAGGJ AAGAAAGAAA- 
78 5L GAAAGAAAGA AAGAAAGAAA. GAAAGAAAGA AAGAAAGAAA GGAAGGAAGG* 
79 01 GAGAGAGAGA GAAGGGAAGG= GAAAGGGAAA GGGAAAGGAA AGAAAAGAAA 
79 51- GGAAGGAAGG AAAGGAAGGA, AGGAAGGAAA, GAAGGAAGGA. AGGAAAAGAA1 
8 0011 AGAGAAGAAA, GCATTCAGCA TATG AACTAA. TGTXTCCTGGI TGACITTTTA1 
80511* TATCATATCCT TXGTTCEAG<^ AAGTGGCCCTT AGCCATATCT TTTGGGTTAT: 
3 10 1 TXTGAGGTACT AGGATAATCAL ACATAGTGTA GAACATTAAAi TCTGGGTLTE 

3 151. GTTTCTAGAA- G AGGCTAGAA- TGGCATGGCT: GTCCCACET& CTCCrCTTTC: - 
3 201. AGG CAGTATGT GCAGCCACCA TTCTCTCTCr AAG ATCTAGG" AGGCTGACAC 
3 251 TCAGGTTGGA GACAGGTCAG AATCCTGAAA TCACTTAGCA AGTTCAGCTG 
3 3 0L ATTCAACAAG* GG AT ATTTACT AGAGAATTAA CAGCTATTCC AGCTTCCAAA 
33 51 AAGTGTACAT TACCTACTCT GTATTTTCAG AACCC CAGGT TTGCTGTGAT 
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34 01 AATTTGGTAG AAGCCTTTTC CTGTAATTTT CTTTATTTAA AAGATATTTT 

34 51 CATTTTCCAC CCTCAAGAAG AGGTTGAAAC TTGTCCCTTG AAGTAGAAGA 

8 501 GGTGTTGTGT GTCCTGACCC TGAGGAAGTT GGCCTTGTTG AGGTCTTCTG 

35 51 TAAATTCTTG AATTGTCTGT ATAATTTCAA TGAATAGTCA TGTTTGATAC 
3 601 CTTGGTATAA AGGATGGGAT AAGATCTTTC AAGGCTTAGG CTGATGGAAA 
3 651 CGCTGCTGAA AGACTAGAGA TTGCTCTTTC CTTTGGCATC TGTCTTGGGT 
3 701 AGTAATATTG TTCTCTGTGA AGGCCCACTT ATTCTGTCTT GAAAATTCTT 
3751 CTTACCTCCA GAGTGATAGG CCACAGGGAG TACTGTTTCT ATGTTTGCAG 
8801 TTGAAAGATG ACAATTTCAT ATGGTCCAAA CTTGGCTTTA TTTCTTGGTG 
38 51 AG AT ATT ATT CTGTTACTTC AATGACCTGT CTCCATTATT TATCTTGAGG 
3901 CTCACCTCTT CCCTTTTGTT GACTGTTGTG CAATTTGTGG AAGGCCCTGG 
8951 GTAGTCAGCC TTTATACTCT GTCTGTACAG GAAATAAAGT GCATGTCACC 
9001 ATGCCAAAGT CAGGAGATGC CGGTGTGATT AGGGTCCACG GGATTTTGCT 
9051 ACTGTTTTTA TTTCTATCGA TGAATTGCCT TAGGCAGAAA CATTAAGGGA 
9101 CACCAGAATG GTGATGAAAG GCTTTTTATA ACAGAAGCTA AATGCAGTCC 
9151 TTCATACTTC ATGGAATGCC CCTGTCCTAA AGTACCATTA ACCGATAGTG 
9201 GAGTCAGAAC ATAAATGGCT CCCCAAAGGT ATCACCAAGA ACTTTTGG C A 

9 251 AACAGATGCA AG AG G ATT AT GAAGAATCGC AGCTTGGTCT GGTAATCTTC 
9301 CTGTTGCAAA GAGAAGAGCT TTAGAAGACC CCCCTTGAGT CCCTGGCTGG 

93 51 CTTAACATAG CATGAACCCT CATGTGTTGG CCAACATTAA GG CTTTTTCT 

94 01 ATAAAAGTCT CCTCCTTCAT CAGTATACGC TCGAGTATGA AAAGCATCCT 
9451 TTTAAACCTT GACTCTGTGT GGTCCAGAAA CAGCAGCATC CCTTGCTTAA 
9 501 GAGCTTAATG GAGATGCAGG AGTGCAGGCC TCTTCCCAGA CCGGCTGATG 
9551 TGCAGGTCAA AGTCTAAGCA CTGCTGGATC AACACAGAAG TTATTCCGAA 
9 601 TGAGGATGAG ATGGATACGA GAGAACAGGA AGTAGGAAGG GATTTCTTTA 
9 651 TCGTGAATTG CTACAGCAGC CTAATGTCAC CCCATACCCT TCTGAAGAAC 
9701 TATGTCCCTG TGGATGCCTT TGTCTCTAGA GTTCTGAGCA AAATGGTAGG 
9751 GTGTG CTTTG CAAAATGTCA TCATTGATGT TGAATTTCAA AGTCTTTAAT 
9801 TAAGGGGCTG AAATCTGTAT ATTGAGATTT GTAAATCATC TAAATTGTAG 
9851 AGTAATGTTT GCACAGGCTG CTTAAGGGAT TGACATTAAA GCTCGTTTTC 
9901 TTAGTTAAGA AATACAGTCA TTTCCTCAAC TCCTCAGTCA TTAGCTCTCT * 
9951 ACTAAGTACA GTGCTGACTT TTTTAAAATT AAAGTCTGTG AATTCCAAAG 

10 001 AAGTGTTTCA CTATTTCCTC CATTATTATA GCTACCTAGA AG CTATGTTC 

10051 ATATATTGGA TTAAAAACGT AGCAATTACA AAGTTAATGT GGCCATATAG 

10101 AAAAGGGAAA AGAAACTCCG CTTTCACTTT AATATATATA TGTGTGTGTG 

10151 TATATCATAT ATATACATGT TGTGTGTGTA TATATATATA TATATATATA 

10201 TATATATATA TATATATATA TATATATATA TGTTGTGTTA AGCAGTAAAC 

10251 TCAGGCCATG GACAGAGGGG CAGACATTGT ATCTCTAGGC CTGACATTTT 

103 01 TAATTTCTGG TTGCAGGTTT TTATGTAGTT TAACTTAAAC. CATGCACTGA 

103 51 AGTTTTAAAH GCTCGTAAGG AATTAAGTTA CCATTGGCTC TCTTACCAAA 

104 01 TGCGTTTCrr TTTTCTCTCC ACCCTGATCA AACTAGAAGC CGTGGAGGAA 
104 5 1_ CTTCCAGAGA TGAGTGGGAA AACGGCCCGG. CGCTTCTTCT* TCAATTTAAG 
10501. TTCTGTCCCC- AGTGACGAGT* TTCTCACATC TGCAGAACTC CAGATCTTCC 
10551 GGGAACAGAT ACAGGAAGCE TTGGGAAACA GTAGTTTCCA GCACCGAATT* 
10601- AATATTTATG AAATTATAAA GCCTGCAGCA GCCAACTTGA. AATTTCCTG-TT 
10651 GACCAGACTA TTGGACACCA GGTTAGTGAA TCAGAACACA. AGTCAGTGGG 
10701 AGAGCTTCGA CGTCACCCCA GCTGTGATGC GGTGGACCAC ACAGGGACAC 
10751. ACCAACCATCT. G GTTTGTG GTT GGAAGTGGCC CATTTAGAGC AGAACCCAGff 

108 or tgtctccaag: agacatgtga. ggaxtagcac gtctttg cac caagatgaac: 

108 51_ ACAGCTGGTC ACAGATAAGG-' CCATTGCTAG" tgactettgg= acatgatgga. 

109 01 AAAGGACATC CGCTCCACAA ACGAGAAAAC CGTCAAG CCA. AACACAAACA 
109 5 1. GCGGAAGCGC CTCAAGTCCA G CTG CAAG AC ACACCCXTTG:' TATGTGGACr: 
11001 TCAGTGATGT GGGGTGGAAT GACTGGATCC? TGGCACCTCC GGGCTATCAT 
11051 GCCTTTTACT GCCATGGGGA GTGTCCTTTT CCCCTTGCTG ACCACCTGAA 
11101 CTCCACTAAC CATG CCATAG TGCAGACTCT GGTGAACTCT GTGAATTCCA 
11151 AAATCCCTAA GGCATGCTGT GTCCCCACAG AGCTCAGCGC AATCTCCATG 
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11201 TTGTACCTAG ATGAAAATGA AAAGGTTGTG CTAAAAAATT ATCAGGACAT 
11251 GGTTGTGGAG GGCTGCGGGT GTCGTTAGCA CAGCAAGAAT AAATAAATAA 
113 01 ATATATATAT TTTAGAAACA GAAAAAACCC TACTCCCCCT GCCTCCCCCC 

113 51 CAAAAAAACC AGCTGACACT TTAATATTTC CAATGAAGAC TTTATTTATG 

114 01 GAATGGAATG AAAAAAACAC AGCTATTTTG AAAATATATT TATATCGTAC 

114 51 GAAAAGAAGT TGGGAAAACA. AATATTTTAA TCAGAGAATT ATTCCTTAAA 
11501 GATTTAAAAT GTATTTAGTT GTACATTTTA TATGGGTTCA ACTCCAGCAC 
11551 ATGAAGTATA AGGTCAGAGT TATTTTGTAT TTATTTACTA TAATAACCAC 
11601 TTTTTAGGGA AAAAAGATAG TTAATTGTAT TTATATGTAA TCAGAAGAAA 
11651 TATCGGGTTT GTATATAAAT TTTCCAAAAA AGGAAATTTG TAGTTTGTTT 
11701 TTCAGTTGTG TGTATTTAAG ATGCAAAGTC TACATGGAAG GTGCTGAGCA 
11751 AAGTGCTTGC ACCACTTGCT GTCTGTTTCT TGCAGCACTA CTGTTAAAGT 

115 01 TCACAAGTTC AAGTCCAAAA AAAAAAAAAA AGGATAATCT. ACTTTGCTGA 
11851 CTTTCAAGAT TATATTCTTC AATTCTCAGG AATGTTGCAG AGTGGTTGTC 
11901 CAATCCGTGA GAACTTTCAT TCTTATTAGG GGGATATTTG GATAAGAACC 
11951 AGACATTACT GATCTGATAG AAAACGTCTC GCCACCCTCC CTGCAGCAAG 
12001 AACAAAGCAG GACCAGTGGG AATAATTACC AAAACTGTGA CTATGTCAGG 
12051 AAAGTGAGTG AATGGCTCTT GTTCTTTCTT AAGCCTATAA TCCTTCCAGG 
12101 GGGCTGATCT GGCCAAAGTA CTAAATAAAA TATAATATTT CTTCTTTATT 
12151 AACATTGTAG TCATATATGT GTACAATTGA TTATCTTGTG GGCCCTCATA 
12201 AAGAAGCAGA AATTGGCTTG TATTTTGTGT TTACCCTATC AGCAATCTCT 
12251 CTATTCTCCA AAGCACCCAA TTTTCTACAT TTGCCTGACA CGCAGCAAAA 
123 01 TTGAGCATAT GTTTCCTGCC TGCACCCTGT CTCTGACCTG TCAGCTTGCT 

123 51 TTTCTTTCCA GG AT ATGTGT TTGAACATAT TTCTCCAAAT GTTAAACCCA 

124 01 TTTCAGATAA TAAATATCAA AATTCTGGCA TTTTCATCCC TATAAAAACC 
12451 CTAAACCCCG TGAGAGCAAA TGGTTTGTTT GTGTTTGCAG TGTCTACCTG 
12501 TGTTTGCATT TTCATTTCTT GGGTGAATGA TGACAAGGTT GGGGTGGGGA 
12551 CATG ACTTAA ATGGTTGGAG AATTCTAAGC AAACCCCAGT TGGACCAAAG 
12601 GACTTACCAA TGAGTTAGTA GTTTTCATAA GGGGGCGGGG GGAGTGAGAG 
126 51 AAAGCCAATG CCTAAATCAA. AGCAAAGTTT GC AGAACCCA AGGTAAAGTT 
12701 CCAGAGATGA TATATCATAC AACAGAGGCC ATAGTGTAAA AAAATTAAAG 
1275L AATGTCTGAT CAGCGTCTCA GCACATCTAC CAATTGGCCA GATGCTCAAA 
128 OL CAGAGTGAAG TCAGATGAGG' TTCTGGAAAG TGAGTCCTCT ATGATGGCAG 
12851 AGCTTTGGTG CTCAGGTTGG. AAGCAAAACC TAGGGAGGGA GGGCTTTGTG* 
12901 GCTGTTTGCA GATTGGGGAA TCCAGTGCTA GTTCCTGGCA GGGTTTCAGG 
129 5 L TCAGTTTCCC G AGTGTGTGTL CCTGTAGCCC TCCGTCATGG TTGAAGCCCA 
1300L GGTCTCACCT CCTCTCCTGA CCCGTGCCTTT AGAACTGACT" TGGAAAGCGG 
13051 TGTGCTTACA GCAAGACAGA CTGTTATAAT* TAAATTCTTC CCAAGGACCT 
13101 CCGTG CAATG* ACCCCAAGCA. CACTTACCTH CGGAAACCTT AAGGTTCTGA 
13151. AGATCTTGTT TXAAATGACT ACCCTGGTTA G CTTTTG ATG" TGTTCCTTAT" 
13201. CCCTTTAGTT: GTTGCACAGG TAGAAACGAr TAGACCCAAC TATGGGTAGC 
13 251 CTTGTCCTCC TGGTCCTTCA GTCATTCTCT AATGTCTCTT." GCTTGCCATG 
1330L GG CACTGT AA CAAACTGCAA T CTT AA CAT CT ' TT AT AAAATG" AATGAACCAC 
133 5L ATATTTACAT CTCCAAGTCC TCCAGATGGG* AGTGCGATCA TTCCATAAGG' 
13 4 01. ATCCCACCTT CTGGCAGGTC TATCCAGTAC ATATTTTATC CTTCATTGGT: 
13 4 51. CTTGATTTTCT TTGGCTAAAA TTACTTGTACr CACAG CAGGC CCCATGTGAC 
13 50L ATATAGGTAX ATACATACAX GTATGTGCAX ATAGTGTGTA CATGTTCTAA 
13 551. TTTATACATA GCTATGTGAA GATTATGTTAL CATATCTAGA TGGXCGCACX 
13 601- TCIGATETCCT ATTTAGGTTC: AGAGAGAGACT GTCACAGTAA- ATGGAGCIAX 
13651. GTCATTGGTA TAXCCCCGAG*. TGGTTCAGGTT GTTCTCrCTA TTTTTTTAAG: 
13701. ATGGAGAACA CTCATCXGTA CTAXCGAAAA1 CEGAGCCAAA XCACCTAGCA 
13751. AATTTCTAGTT CACTGCCTTC CEGTTAAGA1T ACXGATXCAC TGGGTGCXGA- 
128 OL CATGCTGAGO CCTGCCTACT' TTTGCATGAA GGACAAGGAA GAGAGCTTGC 

138 5L AGTTAAGAAX GGTATATGTG GGGOTAGGGGT GCGGCGTATA GACTGGCATA 

139 01 TATGTGAAGG- AAGGTCACAA ACAGCCTGCA CTAATTTCCC TTTTCTGGTT 
139 51 TTATGTCTTG GCAGGGG AAA GGACAGGTAG GGTGGGGTTG AGGGGGAGGG 
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14 001 CACACACATC TACTTGGATA AATTGCATCT CCTCTTTCCT TCACCCCGCC 

14 051 A C CAT AT CTT AAAGCCTTAT GACATCCTCT AGGGCAGAAT TTTCTCACC^ 

14101 GCTCCCCGCC CTACCAACTT CAAAGTGAAC TTCTAACTAA CTTGAGGGGC 

14151 CAAAGTTCTA AATAAAACTT GTTAGAGTTT AGCGGGCACC TCAGTCATCA 

14 201 GGAATGCCTC CAGGAAAGCA AAAAGCTTGA TGTGTGTACA GCCACGTGGT " 

14 251 GGAGTCCTGC CACCCTATGA TTCCTGTCCC AGTGGTCGTG TGGGGCCTGA 

14 301 GATCCTGAAT TTCTAATGAG CTCCCAGTAC GCCCTGACTC ACTGTGCCAG 

14351 AGGACTGCAG TTTGAGTAGC AAGGTTGTGT GACTGTCTTC GATCATGGCT 

14401. ACAGAAGCTG GCTCAAGTAC AGCCCTTCGT GTGTAAAAGC CATGTGTAAA 

14451 TGAGAAGAAA CAGAAGGCAA AGCTGCGTTG CATGGCATCT GAATCAGTGC 

14 501 CCTGCAGTTT TGTTTTTTGT TTTTTTTTTT TCAAAGACAT TCtTTTTCCC 

14551 AACAAGATGA GTGGCAATCT TATGTTCTAG CCACTCTTAG ACATGAAAAC 

14 601 ACTGGGTTGC TTATCTTGTA AAATCTGCTC TGCTTGCTTG CTTGGGCACG 

14 651 CTGCAGTCAG TTTAGTCAAA TGCGTGTCAG TACATCTATA TGTATGAGGG 

14701 AGCAGGTGCA AGTCCTTAGA AATGTACTTT AAAAAACTTG AACACTTAAG 

14751 TCAGTGTGCT GAGCTGCTCC TGTGTGATGT TAGGCCAAGC ACCTGAGTTA 

148 01 AAGGGATCTC TTTGAAGGCA GAGGGTAGAT GTCGTATGGT TGAAGCATTT 

14 851 GTTTATACTA AAATGATGCT TGACTTTTTT TCTAAGTTAT AAGACAGTAC 

14901 ACTGTATAAG TTCATTGAAC . CTAGAGGGTG GCATAGGACT CCAAATCTGG 

14951 TATGGGAGGT TTGTTCTAAT GGAAGTTCGA ATCTTTTTTG CAGTTGGCTT 

15001 GGAATAAAGT GCTTATGTGA ATGGGCTTAA G CT AG GG AAA AAAATGGGTT 

15051 TCCCTCTGCA AAGAGGGTCA GCACAGAAAT AACTTCCTGG CTTTGCTTGC 

15101 ATGAATGCCA CTTGTTAGCA GATGCCCTGT GGGGATCCGA ATTC 
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1 GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG 
SI CTCCCTCTG7 CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG 
101 GCTAGTTTGT ATCCATCTAA ATTGGGGAAG AAAGAAGTAC AGC7GTCCCC 
151 AGAGATAACA GCTGGGTTTT CCCATCAAAC ACCTAGAAAT CCATTTTAGA 
2 01 TTCTAAATAG GGTTTGTCAG GTAGCTTAAT TAGAACTTTC AGACTGGGTT 

2 51 TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTC AG C AAA 

3 01 ATGAGACAAT AG CTGTTATT CAAACAACAT TTGGGTAAGG AAGAAAAATG 

3 51 AACAAACACC ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA 

4 01 GGCAAAGCTG CACCCCTAAG GACAACGAAT CGCTGCTGTT TGTGAGTTTA 
4 51' AATATTAAGG AACACATTGT GTTAATGATT GGAGCAGCAG TGATTGATGT 
501 AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT ATGGGAGCAC 
551 AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 
601 GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT 
6 51 GTTAGAAGCT GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT 
701 TGCCATACCT TGATTAATCG GAGATTAAAA GAGAAGGTGT ACTTAGAAAC 
751 GATTTCAAAT GAAAGAAGGT ATGTTTCCAA TGTGACTTCA CTAAAGTGAC 
801 AGTGACGCAG GG AATCAATC GTCTTCTAAT AGAAAGGGCT CATGGAGACC 
851 TG AG CTG AAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 
9 01 AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGGATGGTC 
951 AGGATGTTGT GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT 

1001 CACCCTTTGT CTCTGGCCAG TAGAATACAG GAACTCGTTC CTGTTTTTTT 
1051 TTTTTTAAAT TCTGAAGGTG TGTAAGTACA AAGGTCAGAT GAGCGGCCCT 
1101 AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC CACCCCAGAA 
1151 ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 
1201 CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG 
1251 GCTGTGATTT CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT 
13 01 TCCCCATCCT CGTTTGTAAA AACAAAGATG AAGCTGATAG TTCCTTCCCA 

13 51 GCTCCATCAG AGGCAGGGTG TGAAATTAGC TCCTGTTTGG GAAGGTTTAA 

14 01 AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA ACTCTTGTTT 
14 5 L CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT 
1501 TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG 
1551 GCACTGTGTT ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA 
1601 AATAACCATC* TGGCAGTTAA GAAGGCTTCA GAGATATAAA TAGGATTTTC 
1651 TAATTGTCTT ACAAGGCCTA GGCTGTTTGC CTGCCAAGTG CCTGCAAACT 
3.701 ACCTCTGTGC ACTTGAAATG TTAGACCTGG' GGGATCGATG GAGGG CACCC 
L751 AGTTTAAGGG GGGTTGGTGC AATTCXCAAA TGTCCACAAG. AAACATCTCA 
1801 CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG^ AAGAGGTATC 

18 51 TCCTTCGGGC ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC 
190L CTTTAACAGT TTACGGAAGG CCACCCTTTA AACCAATCCA ACAGCTCCCT" 

19 5 L TCTCCATAAC: CTGATTTTAG AGGTGTTTCA TTATCTCTAA TTACTCGGGG 
2001 TAAATGGTGA TTACTCAGTG' TTTTAATCAT* CAGTTTGGGC AGCAGTTATT 
205L CTAAACTCAG". GGAAGCCCAG. ACrCCCATGC GTATTTTTGC AAGGTACAGA 
2101. GACTAGTTGG* TGCATGCTTT CTAGTACCTC TTGCATGTGC TCCCCAGGTG 
215 1_ AGCCCCGGCH GCTTCCCGAG. CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA 
2201. ACTGAGGGCE GGGGAGCTGT GCAATCTTCC GGACCCGGCC- TTGCCAGGCG 
2251 AGGCGAGGCC CCGTGGCTGG* ATGGGAGGAT. GTGGGCGGGG CTCCCCATCC 
2301 CAG AAGGG G A . GGCGATTAAG* GGAGGAGGGA AG AAGGG AG C GGCCGCTGGG. 
23 5 L GGGAAAGACTT GGGGAGGAAG* GGAAGAAAGA. GAG G GAGGG A_ AAAGAGAAGG? 
24011 AAG G AGTAG A. TGTGAGAGGG- TGGTGCXGACT GGTGGGAAGG; CAAGAGCGCC? 
2451. AGGCCTGGCC CGGAAGCTAG" GTGAGTTCGGT CATCCGAGCTT GAGAGACCCC 
250 L AG CCTAAG AC GCCTGCGCTG" CAACCCAGCC TGAGTATCTGI GTCTCCGTCC: 
2551. CTGATGGGATT TCTCGTCTAA ACCGTCTTGG: AGCCTGCAGCT GATCCAGTCT 
2601. CTGGCCCTCG- ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC CCAGAAGCAG 
265L CTGCTGGCGA GCCCGCTTCT G CAGGAA CCA ATGGTG AG CA GGGCAACCTG 
27QL GAGAGGGGCG CT ATT CTG AG GATTCGAGGT G C ACCCGT AG" TAGAAGCTGG 
2751 GGATGGGGCT CAGGCTGTAA CCGAGGCAAA AGTTGGCCTA TTCCTCCTTC 
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CTTCTCCAAC AGTGTTGGAG GTGGGATGAT GGAGGCTAAA AGGCACCTCC 
ATATATGTTA CTGCGTCTAT CAACCTACTT TAGGGAGGTG CGGGCCAGGA 
GAGGCGGGAA GGAGAGAAGG CCTTGGAAGA GAGGTCATTG GGAAGAACTG 
TGGGGTTTGG TGGGTTTGCT TCCACTTAGA CTATAAGAGT GGGAGAGGAG 
GGAGTCAACT CTAAGTTTCA ACACCAGTGG GGGACTGAGG ACTGCTTCAT 
TA GGAG AGAG AACCTAGCCA GAGCTAGCTT TGCAAAAGAG GCTGTAGTCC 
TGCTTTGCTC TAAAGCGCGA CCCGGGATAG AGAGGCTTCC TTGAGCGGGG 
TGTCACCTAA TCTTGTCCCC AACGCACCCC CTCCCAGCCC CTGAGAGCTA 
GCGAACTGTA GGTACACAAC TCGCTCCCAT CTCCAGGAGC TATTTTCTTA 
GACATGGGCA CCCATGATTC TGCCTTCTGG TACTCTCCCC TCCCTGGGAA 
AGGGGTGTAA GGTTCCGACG GAACCGTGGC CAGGATGCCG AAAGGCTACC 
TGTGCGGGTC TTCTGCCATG CTGTGTCTGT GCGGACATGC CAGCAGGGCT 
AATGAGGAGC TTGCGATACT CCAAAGGGTT CGGGAATTGC GGGGTCCTTA 
CACGCAGTGG AGTTGGGCCC CTTTTACTCA GAAGGTTTCC GCCACGGCTT 
TGGTTGATAG TTTTTTTAGT ATCCTGGTTT ATGAACTGAA GGTTTTGTGA 
GATGTTGAAT CACTAGCAGG GTCATATTTG GCAAACCGAG GCTACTATTA 
AATTTTGGTT TTAGAAGAAG ATTCTGGGGA GAAAGTGAAG GGTAACTGCC 
TCCAGGAGCT GTATCAACCC CATTAAGAAA AAAAAAAATA CCAGGAGATG 
AAAATTTACT TTGATCTGTA TTTTTTAATT AAAAAAAATC AGGGAAGAAA 
GGAGTGATTA GAAAGGGATC CTGAGCGTCG GCGGTTCCAC GGTGCCCTCG 
CTCCGCGTGC GCCAGTCGCT AGCATATCGC CATCTCTTTC CCCCTTAAAA 
GCAAATAAAC AAATCAACAA TAAGCCCTTT GCCCTTTCCA GCGCTTTCCC 
AGTTATTCCC AGCGGCGACG CGTGTCGGGG AATAGAGAAA TCGTCTCAGA 
AAGCTGCGCT GATGGTGGTG AGAGCGGACT GTCGCTCAGG GGCGCCCGCG 
GTCTCTGCAC CCAGGGCAGC AGTGTGGGAT GGCGCTGGGC AGCCACCGCC 
GCCAGGAAGG ACGTGACTCT CCATCCTTTA CACTTCTTTC TCAAAGGTTT • 
CCCG AAAGTG CCCCCCGCCT CGAAAACTGG GGCCGGTGCG GGGGGGGGGA 
GAGGTTAGGT TGAAAACCAG CTGGACACGT CGAGTTCCTA AGTGAGGCAA 
AGAGGCGGGG TGGAGCGGGC TCTGGAGCGG GGGAGTCCTG GGACTCGGTC 
CTCGGATGGA CCCCGTGCAA AGACCTGTTG GAACAAGAGT TGCGCTTCCG 
AGGTTAGAAC AGGCCAGGCA TCTTAGGATA GTCAGGTCAC CCCCCCCCCC" 
AACCCCACCC GAGTTGTGTT" GGTGAATTTC TTGGAGGAAT CTTAGCCGCG 
ATTCTGT AG C TGGTGCAAAA GGAGGAAAGG GGTGGGGGAA GGAAGTGGCT 
GTGCGGGGGT GGCGGTGGGC GTGGAGGTGG TTTAAAAAGT AAGCCAAGCC 
AGAGGGAGAG GTCGAGTGCA GGCCGAAAGC TGTTCTCGGff TTTGTAGACG 
C TTGG GATCG CGCXTGGGGT: CTCCTTTCGT GCCGGGTAGG* AGTIGTAAAG 
CCTTTGCAAC TCTGAGATCG- TAAAAAAAAT GTGATGCGCT. CT TTCETT GG 
CGACGCCTGT TTTGGAATCT- GTCCGGAGTT AGAAGCTCAG* ACGTCCACCC 
CCCACCCCCC GCCCACCCCC TCTGCCTTGA ATGGCACCGC CGACCGGTTT 
CTGAAGGATC TGCTTGGCTG* GAGCGGACGC TGAGGTTGGC AGACACGGTG 
TG GGGA CTCT" GGCGGGG CTA- CTAGACAGTA CTTCAGAAGC CGCTCCTTCT 

aactttccca caccgcxcaa accccgacac ccccgcggcg gactgagttg 
gcgacggggt cagagtcttc; tggctgaaag. ttagatccgc taggggtcgg* 
ctgcctgtcg- ctagaagcat tatttggcct ctcggagacc cgtgtggagg: 
aagtg cxgg a. gtgtgcgagx. gtgtttgcgt gtgtgxgtgt gtgtgtgtg1t 
gtgt g tgtgtt gtgtgtgtgtt gtgcgcgcgc ccttggaggg tccctatgcg 
ctttcdttt catggaacgc tgtcgtgagg ctttggtaaa ctgtcttttc 
ggtxcctcxc tcggctgcact ctaagctttg: tcggcgctgr aaagagacgc1 
gtcttcaagt: gcaccctgaxt cctcaggctt: cagataaccc: gtccccgaac: 
ctggccagatt gcattgcaqt gcgcgccgca ggtagagacgt tgccccacctt 
cccctgcgtc. cagcgaceac gaccgagagc. cgcgccagtg: tggtgtcccg" 
ccgagagttct cecagagcact gcggggacaa- ctcccagacg: gctggggctct 
cagctgcggc cgcggaggtx ggccecgctc gcaggggctg" gacccagccg 
gggtcggxgg atggaggagc ggcgggcggg ctcttcggtg*. agtggggcgg 
ggcctctggg* tccacgtgac tcctag^ggc tggaagaaaa acagagcctg 
tctgctccag agtctcatta tatcaaatat cattttagga gccattccgt 
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5601 AGTGCCATTC GGAGCGACGC ACTGCCGCAG CTTCTCTGAG CCTTTCCAGC 

56 51 AAGTTTGTTC AAGATTGGCT CCCAAGAATC ATGGACTGTT ATTATGCCTT 

5701 GTTTTCTGTC AGTGAGTAGA CACCTCTTCT TTCCCTTCTT GGGATTTCAC 

5751 TCTGTCCTCC CATCCCTGAC CACTGTCTGT CCCTCCCGTC GGACTTCCAT 

5801 TTCAGTGCCC CGCGCCCTAC TCTCAGGCAG CGCTATGGTT CTCTTTCTGG • 

58 51 TCCCTGCAAG GCCAGACACT CGAAATGTAC GGGCTCCTTT TAAAGCGCTC 

5901 CCACTGTTTT CTCTGATCCG CTGCGTTGCA AGAAAGAGGG AGCGCGAGGG 

5951 ACCAAATAGA TGAAAGGTCC TCAGGTTGGG GCTGTCCCTT GAAGGGCTAA 

6001 CCACTCCCTT ACCAGTCCCG ATATATCCAC TAGCCTGGGA AGGCCAGTTC 

6051 CTTGCCTCAT AAAAAAAAAA AAAAAAACAA AAAACAAACA GTCGTTTGGG 

6101 AACAAGACTC TTTAGTGAGC ATTTTCAACG CAGCGACCAC AATGAAATAA 

6151 ATCACAAAGT CACTGGGGCA GCCCCTTGAC TCCTTTTCCC AGTCACTGGA 

6201 CCTTGCTGCC CGGTCCAAGC CCTGCCGGCA CAGCTCTGTT CTCCCCTCCT 

6251 CCTGTTCTTA ACCAGCTGGA AGTTGfGGAA ATTGGGCTGG AGGGCGGAGG 

6301 AAGGGCGGGG GTGGGGGGGT GGAGAAGGTG GGGGGGGGGG AGGCTGAAGG 

63 51 TCCGAAGTGA AGAGCGATGG CATTTTAATT CTCCCTCCNC CTCCCCCCTT 

64 01 TACCTCCTCA ATGTTAACTG* TTTATCCTTG AAGAAGCCAC GCTGAGATCA 
64 51 TGGCTCAGAT AG C CGTTGG G ACAGGATGGA GGCTATCTTA TTTGGGGTTA 
6501 TTTGAGTGTA AACAAGTTAG ACCAAGTAAT TACAGGGCGA TTCTTACTTT 
6551 CGGGCCGTGC ATGGCTGCAG CTGGTGTGTG TGTGTGTAGG GTGTGAGGGA 
6601 GAAAACACAA ACTTGATCTT TCGGACCTGT TTTACATCTT GACCGTCGGT 
6651 TGCTACCCCT ATATGCATAT GCAGAGACAT CTCTATTTCT CGCTATTGAT 
6701 CGGTGTTTAT TTATTCTTTA ACCTTCCACC* CCAACCCCCT CCCCAGAGAC 
6751. ACCATGATTC CTGGTAACCG AATGCTGATG GTCGTTTTAT TATGCCAAGT 

68 01 CCTGCTAGGA GGCGCGAGCC ATGCTAGTTT GATACCTGAG ACCGGGAAGA 
168 51 AAAAAGTCGC CGAGATTCAG GGCCACGCGG GAGGACGCCG CTCAGGGCAG 
6901 AGCCATGAGC TCCTGCGGGA CTTCGAGGCG ACACTTCTAC AGATGTTTGG 

69 51 GCTGCGCCGC CGTCCGCAGC CTAGCAAGAG CGCCGTCATT CCGGATTACA 
7001 TGAGGGATCT TTACCGGCTC CAGTCTGGGG AGGAGGAGGA GGAAG AG C AG 
7051 AGCCAGGGAA CCGGGCTTGA GTACCCGGAG CGTCCCGCCA GCCGAGCCAA 
7101 CACTGTGAGG AGTTTCCATC ACGAAGGTCA GTTTCTGCTC TTAGTCGTGG - 
7151 CGGTGTAGGG' TGGGGTAGAG' CRCCGGGGCA GAGGGTGGGG GGTGGGCAGC 
7201 TGGCAGGGCA AGCTGAAGGG GTTGTGGAAG CCCCCGGGGA AGAAGAGTTC 
7251. ATGTTACATC AAAGCTCCGA GTCCTGGAGA CTGTGGAACA GGGCCTCTTA 
7301 CCTTCAACTT TCCAGAGCTG' CCTCTGAGGG TACTTTCTGG AGACCAAGTA 
73 51 GTGGTGGTGA TGGGGGAGGG GGTTACTTTG GGAGAAGCGG' ACTGACACCA 
7401 CTCAGACTTC TGCTACCTCC CAGTGGGTGT TCTTTAGCTA TACCAAAGTC 
7451 AGGGATTCTG CCCGTTTTGT TCCAAAGCAC CTACTGAATT TAATATTACA 
7501 TCTGTGTGTT TGTCAGGTTT ATCAATACGG GCCTTGTAAT ACGATCTGAA 
7551 TGTTTCCTAG CGGATGTTTC TTTTCCAAAG"- TAAATCTGAG TTATTAATCC 
7601 TCCAGCATCA TTACTGTGTT' GGAATTTATT TTCCCTTCTG- TAACATGATC 
7651 AACAAGGCGTT GCTCTGTGTT TCTAGGATCG". CTGGGGAAAT. GTTTGGTAAC 
7701 ATAGTCAAAA. GTGGAGAGGG AGAGAGGGTG" GCCCCTCTTT. TTCTTTACAA 
7751 CCACTTGTAA AGAAAACTGT ACACAAAGCC" AAGAGGGGGC TTTAAAAGGG 
7801. GAGTCCAAGG. GTGGTGGAGX AAAAGAGTTG ACACATGGAA. ATTATTAGGC 
7851. ATATAAAGGA GGTTGGGAGA. TACTTTCXGT" CTTTGGTGTT TGACAAATGT 
7901 GAGCTAAGTZ TTGCTGGTTT: GCTAGCXGCT CCACAACTCX GCTCCTTCAA 
79 51. AOTCAAAAGGC: ACAGTAATTT CCTCCCCTTA- GGTTTCTACT ATATAAGCAG. 
8001. AATTCAACCA, ATTCTGCTATT mTlUTlT TGTTTCXTGX TTTTGTTTTG: 

8051. TTTGGrrrrr tttttttttt tttttttttt: gtctcagaaa- agctcatggg: 
8103- ccrrrrerrE tcccctttca. actgtg ccta. v g aacatctggt agaacatccc: 

8153L AGGGACCAGTT GAGAGCTCTG CmTCGTTT CCTCTTCAAC: CTCAGCAGCA 

8201 TCCCAGAAAA TGAGGTGATC TCCTCGGCAG* AGCTCCGGCT CTTTCGGGAG* 

S2S1 CAGGTGGACC AGGGCCCTGA CTGGGAACAC GGCTTCCACC GTATAAACAT 

33 01 TTATG AG GTT AXGAAGCCCC CAGCAGAAAT GGTTCCTGGA CACCTCATCA 

' 33 51. CACGACTACT GGACACCAGA CTAGTCCATC ACAAT<?TGAC ACGGTGGGAA 
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8401 ACTTTCGATG 

8451 CAATTATGGG 

8 501 ACCAGGGCCA 

8 551 GATTGGGCCC 

3 601 GGGCCATACC 

3 651 CACAGCGGTC 

3701 GTGGACTTCA 

8751 CTACCAGGCC 

8801 ACCTCAACTC 

8851 AATTCTAGTA 

8901 TTCCATGTTG 

8951 AGGAGATGGT 

9001 AGGGCGG ACA 

9051 CACGTTCCCA 

9101 GCTGGACTTT 

9151 GAAAAAAAAT 

9201 TTGACCTTAT 

9251 ATTTTGACAA 



TGAGCCCTGC 
CTGGCCATTG 
GCATGTCAGA 
AACTCCGCCC 
TTGACCCGCA 
CAGGAAGAAG 
GTGACGTGGG 
TTCTACTGCC 
AACCAACCAT 
TCCCTAAGGC 
TACCTGGATG 
GGTAGAGGGG 
CACACACACA 
TTCAACCACC 
TATCTTAAAA 
GAAAGA CAGA 
TTATGACTTT 
ATATATTTAT 



AGTCCTTCGC 
AGGTGACTCA 
ATCAGCCGAT 
CCTCCTGGTC 
GGAGGGCCAA 
AATAAGAACT 
CTGGAATGAT 
ATGGGGACTG 
GCCATTGTGC 
CTGTTGTGTC 
AGTATGACAA 
TGTGGATGCC 
CACACACACA 
TACACATACC 
AAAAAAAAAA 
AAAGAAAAAA 
ACGTGCAAAT 
AACTACATAT 



TGGACCCGGG 
CCTCCACCAG 
CGTTACCTCA 
ACTTTTGGCC 
ACGTAGTCCC 
GCCGTCGCCA 
TGG ATTGTGG 
TCCCTTTCCA 
AGACCCTAGT 
CCCACTGAAC 
GGTGGTGTTG 
GCTGAGATCA 
CACACACACA 
ACACAAACTG 
GAAAGAAAGA 
AAAACCCTAA 
GTTTTGACCA 
TAAAAGAAAA 



AAAAGCAACC 
ACACGGACCC 
AGGGAGTGGA 
ATGATGGCCG 
AAGCATCACC 
TTCACTATAC 
CCCCACCCGG 
CTGGCTGATC 
CAACTCTGTT 
TGAGTGCCAT 
AAAAATTATC 
GACAGTCCGG 
CACACACACA 
CTTCCCTATA 
AAGAAAGAAA 
ACAACTCACC 
TATTG AT CAT 
TAAAATGAG 
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bntp2p 

GAATTCATTTAAACTv VITCACTTCTAGGTCCCATGCGTTTACAC1 % 

TTCCACCACAAGAGGGCAGCCATCTCrrAAAAAAACAACACT 

TTCAGAGAAATTGGGCCAAACTTGAGGAAAGTrc^ 

AGCAGCACCTCTCTGGGCTACAAAAAAGAAGCCAGCAGGCA 

TGGAGTAACTCTCCAQAGGCATCCATTITACCTCAG 

AGGATATCCTAAAOGGCOIAACTCTCTCTTCTGGTCCT 

AGCTCCAAGGCATTGTTGATGTCATCACCAAAGGTT^ 

TCTTGGGGTTGGTCCAACAGCTGTCAGCTTrCT . 

ACTTTCTCATTrAAATCTCATATAGGTTCGGAGT T IVriUClTlXCT CCT 

TCCGCCTCQaCGATGACAGAAGCAATGGTTAACT^ 

TAGGGAAGGAAATGGCTTCAGAGGCGATCAGC^ 

TACACGTCIX5AGTGGAGTCTITTATTC 

TTCAGAGTGACAACTTCTGCAACACGTITTAAAAAGGAAT^ 

ATCGCAAATTGCTGGATCTATCCCTTCCTCTCCTTTAATTTC 

AC7VGCCTTCCTTCAAAAATACCTTATTTGACCTCT 

GCCAGGGCCTAATTTCCCTCTGTGGGTTGCTAA 

AACCTAGAGTTATTTTAG CTCCCCGACTGAAAAG CTAGCACACGTGGGTA 

AAAAAATCATTAAAGCCCCTGCTIXriX^TCTTTCTCGGTCT 

AAACTGGAAAGATCTGGTTCACAACGTAACGTTATTCACT 

ACAGGAATGCTCAGCCCATAGTTTTGGGGGTCCTGT^ 

GGTACTATGAAGGCTCCTGAATGTAGGGAGAAATGGAAAGA 

AGAATCCTGGCTCAGCAGCTTTGGGGA 

TGGCTTGGCC^CAGCCAGAGCCTTACTGCTCGA 

GGACCAGGCAGAAAATTC^VAAGGTCTCAAACCGGAAT TC 

GACTCTGGAGTAGGTGGGTGTGGAAGGGAAGATAAATATCAC^ 

AAGTCATCGCTTCTATAAAGAG^TITCTATTAACTOT 

ACATGGACAC ACAC ACACACACACACACACACACACACAC^ 

GGGATGTCCACTTTACAAGTGTGTATCTA 

ATTTTTATAATTTACATAAATAAATACATATAA^^ 

ATTAGATTCATTTATTTGAATATAAATGTATGAATATITA 

TAATGCACTCAGATGTGTATCGGCTATTTCTCGACAr 

TTCAAAACAGAAGCG TTTGCT CACATTTTr^ 

GTAAGTTCTGTTCTTCTITTrAATGTC 

CAAGTTCAATATTGGCCCAATGAGGGAACTCAGAGGCCAGTC 

ATTTGCCCTAGTCTCCCGCAGCrrc^ 

CGGCTTCACACTCATCCGGGACGCGACCCCTTAGCGGCOT 

CCGCCCCGCTCCACCGCGGCCGCCCCGTAGGGCXX^CGTCCACACCCCT 

GCGCGCCGCTCCCGCCCGCCCGGGGATCCCC^ 

GGGGAGGTGTTCEGCCACGGCCGGGAGGGAGCCGGCAGGCGG 

TTAAAAGCCX3CX3AGCGCCGCX^CACGGCXj^ 

tcctcgccctgccgcgcagagccctgctcgcactgcxk:cc^ 
cgcttcccacagcccgcccgggattggc^ 
ggcgacaccaggcaccggacgccct^ccggcgaaagac^ 
cgcggcttcgagggactggcacgacacgggttggaactccagactgtc 

CGCCTGGCGCTGTGGCCTCGGCTGTCCGGGAGAAGCT 

GACGCTAAGAACCGGGAGTCOGGAGCACAGTCCTACCCTCAATGCGGG^ 

CACTCTGACCCAGGAGTGAGOGCCCAAGGCGAGCGGGCGGAAGAGT^ 

GGACCCC^^CTGCCACAAAAGACACTTGGCCCGAGGGCT 

GGTCACCCGGTTTGG CAAC C CXSAGLACXKrGCGGCTGGACTGTCTOGAGAAT 

GAGCCCCAGGACGCCGGGGCGCCGCAGCraTG 

GCTGATGGGGGTGCGCCAGAGTCAGGCIGAGGGATGCAGM 

GCCCGCCACCCAGATCITCGCTGCGrc^ 

ACGATGGCTGCCCCGAGCCATGGGTOGCX^CCCAGCTAACGCAGAAOGTC 
CXJTCCCTCGCCCGGCGAGTCCC^^ 

GGTCCCTGAGGCCGACGACAGCAGCAGCCTTGCCTCAGCCCT 
GTCCCGGCCCCGCACTCCTCCCCCTGCTCGAGGCTGTGTGTCAGC^ 
GCTGGAGACTTCTTGAACTTGCCGGGAGAGTGAC^ 
GCGCCGGTGTCCTCGCCQ3GCGGATCC 
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